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Summary 


This  repon  is  a  case  study  in  data  refinement,  that  is  the  process  of  taking  a  formal 
specification  written  in  terms  of  abstract  values  and  converting  it  into  a  concrete 
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repon  concludes  with  a  discussion  of  the  strengths  and  weaknesses  of  the  formal 
development  process. 
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1  Introduction 


The  trustworthiness  of  high  integrity  software  is  established  by  demonstrating  that 
the  software  satisfies  its  specification.  For  certified  software,  the  demonstration 
will  be  directed  towards  an  independent  evaluator  who  has  to  judge  whether  the 
software  possesses  the  properties  claimed  for  it.  When  the  demonstration  involves 
the  use  of  formal  methods  the  term  refinement  is  used,  standing  for  the  process  of 
producing  the  implementation  in  such  a  way  that  formal  proof  of  satisfaction  could 
be  given.  Consequently,  refinement  is  an  important  technique  for  the  production  of 
software  to  the  higher  levels  of  assurance. 

Refinement  is  usually  associated  with  the  final  stages  of  producing  an 
implementation.  This  arises  from  the  need  to  use  simple  examples  when  explaining 
the  concept,  which  are  consequently  rather  close  to  the  implementation  language 
chosen.  Most  specifications  are  far  removed  from  implementation  languages,  so  the 
first  steps  of  formal  design  will  nearly  always  involve  refining  the  specification. 
These  steps  may  be  carried  out  within  the  specification  language  itself  and 
represent  the  major  part  of  the  creative  design.  Refinement  is  not  an  automatic 
method  for  the  generation  of  an  implementation  from  a  specification  and  a  key  issue 
in  its  use  is  the  extent  to  which  it  actually  helps  an  implementor  to  produce 
trustworthy  software.  Another  key  issue  is  the  extent  to  which  the  formality  can  help 
in  demonstrating  trustworthiness.  In  principle  a  formal  proof  forms  a  convincing 
argument  but  the  details  of  the  fonnalism  may  obscure  the  understanding  of  what  is 
being  proved.  By  hiding  detail  and  automatically  checking  proof  steps,  tools  could 
assist  the  process  but  with  current  technology  the  degree  to  which  a  proof  helps 
rather  than  hinders  the  evaluation  is  an  open  question.  The  purpose  of  this  report  is  to 
discuss  these  issues  within  the  context  of  a  realistic  case  study. 

Section  2  presents  a  formal  specification  of  the  problem,  which  is  concerned  with 
pattern  matching  in  the  programming  language  ML.  This  is  a  problem  which  is 
interesting  in  its  own  right,  but  the  formal  refinement  has  a  number  of  surprises 
which  demonstrate  the  power  of  the  method  over  the  use  of  an  intuitive  approach. 

Sections  3  to  5  deal  with  the  formal  development  of  the  design  specification. 
Section  3  deals  with  the  choice  of  representation  for  the  implementation.  For  the 
refinement  method,  the  relationship  between  the  the  entities  in  the  abstract 
specification  and  those  in  the  concrete  implementation  must  be  given  and  this  is 
known  as  the  abstraction  invariant.  Given  this,  the  constraints  on  the 
implementation  functions  can  actually  be  calculated,  and  this  is  done  in  section  4. 
With  these  in  mind,  section  5  deals  with  the  design  of  the  implementation 
algorithms  at  an  abstract  level  which  ensures  that  they  are  compatible  with  the 
specification.  It  is  this  aspect  of  abstract  algorithm  design  where  refinement  has 
most  to  offer  the  implementor  as  opposed  to  the  evaluator. 

Section  6  covers  the  evaluator's  requirements.  It  discusses  the  type  of  proof  document 
which  would  illuminate  the  discussion  rather  than  obscuring  it  and  gives  an  example 


of  the  way  in  which  the  salient  features  of  the  proof  requirements  could  be 
displayed. 

Section  7  provides  an  implementation  of  the  problem  in  Algol68  which  may  be 
compared  with  the  formal  design  specification. 

Finally  section  8  discusses  the  advantages  and  disadvantages  of  the  formal  approach 
and  gives  recommendations  for  current  practice. 

As  some  readers  may  not  be  familiar  with  ML  the  remainder  of  this  introduction 
describes  the  pattern  matching  features  of  the  language.  ML  was  produced  as  a 
spin-off  from  the  theorem  proving  system  LCF  [Gordon  et  al  1979,  Paulson  1987].  In  the 
latter,  theorems,  proof  rules,  tactics  and  so  on  are  manipulated  by  means  of  an 
interactive  meta  language.  As  the  ideas  for  this  crystallised  it  was  realised  that  the 
best  meta  language  was  in  fact  a  programming  language  enriched  with  a 
polymorphic  type  discipline  and  exception  handling.  The  language  could  be  used 
quite  generally  in  a  similar  manner  to  Lisp.  The  LCF  idea  produced  a  number  of 
offspring,  and  varying  dialects  of  ML  began  to  appear.  As  a  result,  Milner  convened 
the  ML  community  with  a  view  to  developing  a  standard  [Harper  et  al  1988].  The  new 
language  is  larger  and  more  powerful  than  its  predecessors  and  is  interesting  in  its 
own  right,  quite  apart  from  it.s  exciting  use  with  theorem  proving. 

An  attractive  feature  of  the  language  is  the  use  of  pattern  matching  in  the  specification 
of  functions.  In  ML,  a  function  may  be  supplied  as  a  series  of  clauses,  each  clause 
specifying  how  values  of  a  certain  structure  are  to  be  handled  by  the  function  as  a 
whole.  This  is  best  illustrated  by  means  of  an  example,  for  which  the  ML  concepts 
of  datatypes,  patterns  and  pattern  matching  will  be  needed. 

The  ML  datatype  declaration  is  similar  to  a  disjoint  union  and  is  a  generalisation  of 
the  idea  of  an  enumerated  type  in  Pascal.  The  simplest  datatype  declarations  take 
the  form 

datatype  colour  =  red  |  blue  |  green 

which  declares  colour  as  a  type  containing  the  three  values  red,  blue  and  green.  More 
usually,  the  datatype  is  constructed  from  previously  defined  types,  as  in  the 
following  declaration: 

datatype  label  =  code  of  colour  1  number  of  int*int  \  name  of  string 

Thus  a  label  value  is  constructed  either  from  a  colour,  a  pair  of  integers  or  a  string. 
Instead  of  being  constants  as  in  the  previous  example,  code,  number  and  name  are 
constructor  functions  which  construct  values  having  type  label.  So 
code(red),  number(l,  2)  and  name  "fred"  would  all  be  values  of  type  label. 

An  ML  pattern  is  either  a  variable,  an  expression  involving  constants,  constructor 
functions  and  variables  or  is  a  tuple  of  patterns.  Thus,  given  variables  x,  i,j,  then  x. 


number(i,  j)  and  code(red)  are  all  patterns.  A  pattern  matches  a  set  of  values.  A 
variable  will  match  any  value,  but  tuple  patterns  and  patterns  involving  constructors 
only  match  values  which  have  the  corresponding  structure.  Thus  x  matches  any  value, 
number(i,  j)  matches  any  label  value  constructed  from  two  integers  and  code(red) 
matches  the  single  value  it  denotes. 

A  simple  example  of  a  function  declared  as  a  series  of  clauses  is  one  which 
permutes  the  colours: 

fun  colour _perm  red  =  blue 
I  colour _perm  blue  =  green 
I  colour _perm  green  =  red 

This  is  both  clearer  and  more  concise  than  the  corresponding  expression  involving 
conditionals.  However,  this  expressiveness  has  been  bought  at  the  cost  of  some 
additional  complexity  in  an  ML  compiler  because  it  is  necessary  to  check  that  the 
patterns  supplied  in  the  parameter  position  account  for  all  the  possible  values  to 
which  the  function  might  be  applied.  Thus  colour jterm  is  a  function  from  colour  to 
colour,  but  if  the  last  clause  of  its  definition  had  been  omitted  it  would  only  have 
been  applicable  to  red  and  blue  values.  Consequently,  it  is  a  requirement  on  ML 
compilers  that  they  should  be  able  to  check  whether  a  set  of  patterns  is  exhaustive,  that 
is,  the  patterns  match  every  possible  value  belonging  to  a  type.  A  related  problem  is 
that  the  addition  of  a  clause  to  a  function  may  be  redundant,  that  is,  it  may  not  increase 
the  number  of  values  already  matched.  In  this  case  the  clause  is  superfluous  and  the 
programmer  has  made  a  mistake. 

The  two  tests  required  are  intuitively  obvious  in  the  simple  case  above,  but  when 
constructor  functions  and  tuple  values  are  considered,  intuition  can  be  misleading.  In 
this  case  the  formal  specification  is  particularly  helpful  and  it  is  presented  here 
using  the  specification  language  Z  [Sufrin  1983,  Hayes  1987,  Spivey  1988]. 

Notation:  Sections  2  to  6  of  this  report  are  Z  documents  each  of  which  has  been 
mechanically  type-checked.  The  conclusion  of  each  document  is  marked  by  a 
"keeps"  statement  which  lists  the  identifiers  exported  by  the  document.  A  document 
is  imported  into  another  one  by  including  the  document  name,  which  is  distinguished 
by  having  a  box  drawn  round  it. 


2  Z  specification  of  the  problem 

2.1  SpeciHcation  of  ML  values  and  types 

The  specification  is  defined  in  terms  of  the  operations  a  compiler  must  perform  to 
carry  out  the  test.  A  single  pass  of  compilation  is  envisaged,  so  three  operations  are 
defined,  an  initial  operation  used  to  specify  the  compiler's  action  on  encountering  the 
first  pattern  of  the  clausal  function,  a  testing  operation  to  specify  the  action  required 
on  encountering  the  subsequent  patterns  and  a  final  operation  to  test  whether  the  set 
of  patterns  is  exhaustive.  The  parameters  for  the  second  operation  are  the  new 
pattern  just  compiled  and  the  set  of  values  matched  by  the  patterns  obtained  from 
the  other  clauses  of  the  function  (the  other  patterns  in  the  match,  in  ML  terms).  The 
check  to  be  made  is  whether  the  set  of  values  matched  increased  as  a  result  of 
adding  the  latest  pattern,  in  which  case  the  match  is  not  redundant.  For  the  third 
operation,  the  parameter  is  the  set  of  values  accounted  for  by  all  the  patterns  in  the 
match,  and  the  test  is  whether  this  is  all  the  values  belonging  to  the  type,  in  which 
case  the  set  of  patterns  is  exhaustive. 

The  problem  is  defined  in  terms  of  ML  values  so  it  is  necessary  to  build  a  Z 
representation  of  them.  In  this  a  number  of  significant  simplifications  may  be  made. 
First  of  all,  the  treatment  of  the  special  constants  (denotations  for  integers,  reals  and 
strings)  is  the  same  for  each  type  of  constant,  so  they  are  all  represented  together  by 
the  Z  given  set  SCONST.  Secondly,  function  values  may  only  be  matched  by 
variables,  which  by  definition  are  exhaustive.  As  the  test  is  trivial  in  this  case, 
function  values  are  not  considered  further  (they  may  be  considered  as  being 
members  of  SCONST).  Thirdly,  the  compiler's  representation  of  constructors  is 
irrelevant  to  the  specification  of  the  problem,  so  the  constructors  are  simply 
introduced  as  a  given  set  CON.  Fourthly,  the  ML  record  type  will  simply  be  treated  as 
though  it  were  the  corresponding  tuple.  Finally,  polytypes  will  be  ignored.  A 
polymorphic  match  will  be  exhaustive  or  redundant  at  each  and  every  instance  of  its 
type,  so  the  problem  is  independent  of  the  polymorphism. 

With  these  simplifications,  an  ML  value  is  either  the  special  ML  void  or  unit  value 
or  is  drawn  from  the  set  of  special  constants  or  is  a  construction  or  a  tuple  of  values. 
For  the  Z  representation,  a  labelled,  disjoint  union  will  be  used,  but  as  the  definition 
is  recursive,  Value  is  introduced  as  a  given  set.  Thus  values  will  be  built  up  from  the 
following  given  sets: 

[Value.  SCONST,  CON] 

An  exhaustive  set  of  patterns  is  one  which  matches  every  value  of  a  given  type  and 
so  the  next  step  is  to  define  what  is  meant  by  a  type.  The  types  are  introduced  as 
disjoint  subsets  of  the  values,  with  the  actual  type  structure  being  given  by 
constraints  to  be  added  later: 


Type  :PPj  Value 


U  Type  =  Value 

V  Tj,  T2  :  Type \Tj^T2-Tj  nT^^  0 

A  constructed  value  is  built  from  a  constructor  and  a  value  for  the  parameter.  The 
parameter  value  must  be  drawn  from  a  specific  type  which  was  associated  with  the 
constructor  in  the  datatype  declaration.  This  association  can  be  represented  by  the 
following  (unspecified)  function: 

type_of_con  :  CON  — >  Type 

Constructed  values  may  now  be  described  in  terms  of  the  following  schema: 

Cons_val _ , 

con  :  CON;  val :  Value 


vale  type _of_con( con) 


A  constructor  which  is  a  constant  will  be  treated  as  having  the  unit  value  as  a 
parameter. 

Tuples  will  simply  be  represented  as  sequences  of  two  or  more  values.  Tuples  of 
patterns  will  also  be  required  so  it  is  helpful  to  make  the  generic  definition: 

F  [Tl  =  q 

tuple  T  =  =  {5  .•  seq  T  |  >  7 } 


With  these  definitions,  the  structure  of  values  may  now  be  represented  by 

Value  ;.  =  unitv  \  sconstv  «  SCONST  »  |  consv  «  Cons_yal  »  1  tupv  «  tuple  Value  » 

Given  this  simplified  model  of  the  values,  the  type  structure  is  determined  only  by 
the  constructors  and  tuples.  The  contribution  a  constructor  makes  to  a  type  is 
represented  by  the  set  of  constructors  making  up  the  datatype  declaration.  This  is 
determined  for  each  constructor  and  available  to  the  compiler,  so  it  is  possible  to 
presume  the  existence  of  the  function  below: 

datatype  :  CON  — >  IP  CON 

V  csj,  CS2  •'  ran  datat  pe»  csj  =  CS2  v  csj  n  cs2  =  0 

The  constraint  in  this  partial  specification  ensures  that  the  range  of  datatype  partitions 
the  set  of  constructors. 
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The  values  making  up  a  type  all  possess  the  same  structure  determined  by  whether 
the  values  are  tuples  or  constructed.  A  relation  samejype  on  the  values  will  be  defined 
which  specifies  this  property.  The  definition  will  be  given  by  cases  according  to  the 
structure  of  Value,  and  as  it  is  recursive,  the  signature  is  given  first: 

_same_type_  :  Value  Value 

For  the  primitive  values,  unitv  forms  a  type  on  its  own  and  all  the  special  constants 
arc  considered  to  form  one  type.  This  is  expressed  by  the  schema: 

_  Primitive  ^ 

v,w:  Value 


V  =  unitv  A  w  =  unitv 

V 

V  e  ran  sconstv  a  w  e  ran  sconstv 


Two  constructed  values  have  the  same  type  if  the  constructors  are  drawn  from  one 
constructor  set  in  the  range  of  datatype: 

—  Constructions  ^ 

V,  w  .•  Value 

V  €  ran  consv  a  w  e  ran  consv 

datatype  (cons\'~^  v).con  =  datatype  (consv~^  wj.con 


As  a  consequence  of  this  it  is  possible  to  deduce  the  set  of  constructors  associated 
with  a  constructed  type: 

Constructors  ==  X  ty  :  Type  \  ty  c  ran  consv  •  {v  .•  ry  •  (consv~^  v).con} 

Finally,  tuple  values  have  the  same  type  if  they  are  the  same  length  and  the 
corresponding  values  have  the  same  type: 

^  Tuples  _ , 

V,  w ;  Value 

V  €  ran  tupv  a  w  e  ran  tupv 
#vs  =  ttws 

V  /  .•  7  ..  #vj  •  i)same_type(ws  i) 
where 

vs  ==  tupv~^  V 
ws  ==  tupv~^  w 


As  a  tuple  type  is  made  up  of  values  of  the  same  tuple  length  it  is  possible  to  define 
a  Size  functi'^"  to  deliver  it: 

==  Xty:  Type 
1  ty  C  ran  tupv 

•  p.  n  ;  I  fV  V  ;  mpv“^  Lyl  *  n=-U\)»  n 
The  definition  of  samejype  may  now  be  completed  as 

V  V,  H’ ;  Value  •  v  samejype  w  ^  Primitive  v  Constructions  v  Tuples 
The  set  of  types  is  simply  the  set  of  equivalence  classes  of  samejype\ 

V  0’  ■  Type  •  V  Vy,  V2  ;  ty  •  Vj  samejype  V2 
Consequently,  it  is  possible  to  deduce  the  type  of  a  value: 

typejf  =  =  Xv  :  Value  •  {w  ;  Value  \  w  same _type  v} 

2.2  Specification  of  ML  patterns  and  pattern  matching 

Patterns  are  formed  in  a  similar  way  to  values,  except  that  one  of  the  primitive 
patterns  is  a  variable  which  matches  any  value.  As  with  values,  simplifications  will 
be  made:  the  ML  layered  pattern  feature  will  be  ignored  and  the  wild  card  treated  as 
a  variable.  For  this  problem  it  is  not  necessary  to  know  the  representation  of 
variables,  so  they  are  introduced  as  a  given  set  along  with  Pattern  itself,  introduced  for 
the  purposes  of  the  recursive  definition: 

[Pattern,  Variable] 

Constructed  patterns  may  be  formed  using  the  following  schema: 

_  Cons j)att _ , 

con  :  CON;  patt :  Pattern 


Using  this,  the  structure  of  patterns  is  given  by 

Pattern  ::=  unitp  \  sconstp  «  SCONST  »  1  var  «  Variable  »  |  consp  «  Cons jjatt  » 

I  tupp  «  tuple  Pattern  » 

The  definition  of  pattern  matching  is  given  in  terms  of  a  relation,  matches,  specifying 
which  values  match  a  given  pattern: 

matches  :  Pattern  <->  Value 


This  will  be  defined  recursively  over  the  structure  of  patterns  and  values.  For  the 


special  constants  a  simplification  will  be  made.  The  set  of  special  constants  is 
infinite  ano  can  never  be  matched  by  a  finite  set  of  patterns  unless  it  contains  a 
variable.  For  this  case,  it  seems  excessive  to  keep  track  of  the  special  constants  in  a 
set  of  patterns  simply  in  order  to  check  for  redundancy.  Consequently  the 
specification  is  relaxed  to  omit  the  redundancy  check  in  this  case.  This  wDl  be  done 
by  allowing  special  constants  to  be  matched  by  variables  only.  (Note  that  this  is  not 
necessary  to  our  approach,  but  apart  from  being  a  sensible  relaxation  it  simplifies 
the  presentation.)  For  the  other  primitive  patterns,  the  unit  pattern  matches  the  unit 
value  and  a  variable  matches  any  value.  This  is  expressed  by  the  following  schema: 

MPrimitive  _ , 

p  :  Pattern;  v  .■  Value 

p  -  un’tp  A  V  =  unin’ 

V 

p  e  ran  var 


A  constructed  pattern  is  matched  by  a  constructed  value  if  the  constructors  are  the 
same  and  the  parameters  match; 

_  MConstruciions  _ , 

p  :  Pattern;  v  .•  Value 

p  €  ran  consp  a  v  e  ran  consv 

pcon.con  =  vcon.con  a  pcon.patt  matches  vcon.val 
where 

peon  ==  consp~^  p 
vcon  ==  consv~^  v 


Similarly,  a  tuple  pattern  matches  a  tuple  value  if  the  two  tuples  are  the  same  size 
and  corresponding  elements  match. 

MTuples _ , 

p  ;  Pattern;  v  .•  Value 

p  e  ran  tupp  a  v  €  ran  tupv 
#(p  =  #fv 

i ;  1  ..  Utp  •  (tp  i)matches(n  i) 
w>here 

tp  ==  tupp~^p 
tv  ==  tupv~^  V 


This  gives,  for  the  definition  of  matches  : 
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V  p  :  Pattern;  v  ;  Value  •  p  matches  v  M Primitive  v  MConstr actions  v  MTuples 

The  value  coverage  of  a  set  of  patterns  is  the  set  of  values  which  may  be  matched  to 
the  patterns.  As  a  variable  matches  any  value  it  is  necessary  to  restrict  the  set  of 
values  to  one  ty'pe  so  the  compiling  operations  are  specified  in  terms  of  a  valcover 
function  as  follows: 

valcover  ==  X  pan  :  Pattern;  ty  ;  Type  •  {v  .•  ty  1  pan  matches  v} 

For  a  set  of  patterns  pans  of  type  ty  to  be  exhaustive 
U  [p  ;  pans  *  valcover(p,  ryj}  =  O' 

2.3  The  compatibility  of  types  and  patterns 

The  implementation  of  the  compiling  operations  is  considerably  simplified  if  it  is 
possible  to  make  use  of  the  fact  that  the  patterns  are  well-typed.  This  constraint  may 
be  expressed  using  a  relation,  similar  to  samejype,  which  specifies  which  patterns  are 
compatible  with  a  type.  This  relation  is  defined  recursively  in  the  usual  way  as 
follows: 

_pattern_compatible _  ;  Pattern  4-)  Type 

For  the  primitive  patterns,  the  unit  pattern  is  compatible  with  the  unit  type,  the 
special  constant  patterns  are  compatible  with  the  special  constant  type  and  a 
variable  is  compatible  with  any  type: 

PCPrimitive _ , 

p  ;  Pattern;  type  ;  Type 

p  =  unitp  A  type  =  {uniO} 

V 

p  e  ran  sconstp  a  type  =  ran  sconstv 

V 

j  p  e  ran  var 

■  • 

For  constructed  patterns,  the  constructor  used  must  be  present  in  the  set  used  to 
construct  the  values  of  the  datatype,  and  the  parameter  must  be  compatible  with  the 
parameter  type  of  the  constructor: 


_  PCConstructions _ , 

p  :  Pattern;  type  :  Type 

p  e  ran  consp  a  type  c  ran  consv 

con  6  {Cons_val  |  consv(QCons_val)  e  type  •  con} 

patt  pattern jcompatible  type _of_con( con) 
where 

Cons  _patt 

consp(QCons  _patt)  =  p 


A  tuple  pattern  is  compatible  if  the  size  of  the  tuple  is  the  same  as  the  tuple  size  of 
the  type,  and  each  element  of  the  pattern  is  compatible  with  the  types  formed  from 
the  elements  of  the  tuple  t>T)e; 

PCT uples _ , 

p  :  Pattern;  type  :  Type 


p  6  ran  tupp  a  type  Q  ran  tupv 
fttp  =  Size  type 
V  j ;  7  ..  #rp 

•  (tp  i)  pattern  jzompatible  {rv  ;  tupv'^  hypel  •  tv  i} 
where 

tp  ==  tupp~^p 


V  p  ;  Pattern;  type  :  Type 

•  p  pattern_compatible  type  ^  PCPrimitive  v  PCConstructions  v  PCTuples 

For  convenience,  the  predicate  is  defined  as  a  schema: 

Pattern  jOompatible _ , 

p  ;  Pattern;  type  ;  Type 

p  pattern_compatible  type 


2.4  The  compiling  operations 

The  initial  operation  generates  the  set  of  values  from  the  first  pattern  to  be 
compiled: 


Initop _ 

vals! :  P  Value;  par?  :  Pattern;  type  ;  Type 

par?  pattern_compatible  type  a  vals!  =  valcover(par? ,  type) 


The  type  of  the  pattern  being  compiled  is  given  by  type,  and  as  we  are  not  concerned 
with  type  checking  aspects  of  the  compiler,  this  is  treated  as  a  constant  throughout 
all  the  pattern  checking  operations. 

It  is  convenient  to  express  the  result  of  the  checking  operation  in  terms  of  a  new  Z 
datatv'pe: 

Result  .•.  =  OK  I  INCOMPLETE  I  REDUNDANT 

The  checking  operation  is  specified  in  terms  of  the  set  of  values  covered  by  the 
patterns  compiled  so  far  and  the  effect  of  adding  one  more  pattern. 

^  Check_op _ , 

vals?,  vals! :  P  Value 
par?  :  Pattern 
r! :  Result 
type  :  Type 

par?  patternjcompatible  type  a  vals?  c  type 
vals!  =  valcover(par? ,  type)  u  vals? 

valcover(par? ,  type)  a  vals!  =  vals?  a  r!  =  REDUNDANT 

V 

(valcover(par? ,  type)  =  {}  v  vals!  ^  vals?)  a  r!  =  OK 


In  this  schema,  vals?  represents  the  set  of  values  accounted  for  by  the  patterns 
compiled  so  far  and  vals!  the  result  of  adding  the  new  pattern  par?.  The  result  of  the 
operation  is  left  in  r!.  The  predicate  valcover(par? ,  type)  ^  {}  eliminates  reporting  a 
redundancy  when  the  patterns  include  special  constants. 

The  pre-condition  of  this  operation  is  easily  simplified  to 

par?  patternjcompatible  type  a  vals?  c  type 

If  this  is  so,  then  from  the  definition  of  valcover  it  follows  that  vals!  is  also  a  subset  of 
type,  and  for  both  operations.  This  is  essential  as  the  placing  of  the  compiling 
operation  with  respect  to  the  syntax  of  ML  ensures  that  the  output  of  Init_op  will  form 
the  input  to  Check  op  as  does  the  output  of  Check  op  itself.  Strictly  speaking,  type  is 
redundant  in  Check_op  as  it  could  be  deduced  from  vals?.  (A  function  type_of,  which 
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gives  the  type  of  a  value,  has  already  been  defined.)  However,  leaving  the  type  in  in 
this  way  leads  to  a  clearer  specification. 

The  final  operation  checks  whether  the  set  of  patterns  is  exhaustive  and  is  simply 
given  by; 

rFinal_op  _ , 

vals?  :  P  Value;  type :  Type;  r! :  Result 

)  vals?  =  type  a  r!  =  OK  v  \als?  ^  type  a  r!  =  INCOMPLETE 


As  with  the  other  operations,  the  ML  syntax  determines  when  this  operation  is 
called:  the  input  is  provided  either  by  Initjop  for  a  one-pattem  match,  or  by  the  last 
call  of  Check_op  in  a  multi-pattern  match. 

This  completes  the  specification  of  the  problem  which  should  be  checked  for 
validity,  that  is,  that  it  accurately  captures  the  essence  of  what  has  been  informally 
specified  in  the  ML  definition. 

Zjnaich_spec  keeps  Value,  Cons_val,  datatype,  unitv,  sconstv,  consv,  tupv,  Cons _patt, 
Constructors,  Size,  Type,  SCONST,  CON,  type_of_con,  tuple, 
Pattern,  unitp,  var,  sconstp,  consp,  tupp, 
matches,  MPrimitive,  MConstructions ,  MTuples, 

PCPrimitive,  PCConstructions,  PCT uples, 
pattern  jcompatible,  PatternjCompatible, 

Result,  OK,  INCOMPLETE,  REDUNDANT, 
valcover,  Initjop,  Check jop,  Finaljop 
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3  Formal  implementation  -  the  abstraction  invariant 


Zjnatch_spec  .'Module 


The  specification  has  been  written  in  a  form  which  matches  as  closely  as  possible 
the  informal  specification  given  in  the  language  definition.  As  such  it  is  defined  in 
terms  of  possibly  infinite  sets  of  values  and  it  is  not  possible  to  implement  a  test 
such  as  vals?  -  type  in  Final _op  directly.  Instead  a  coverage  measure  is  used  to  keep 
track  of  how  completely  a  set  of  patterns  accounts  for  the  values  in  a  type.  Data 
refinement  is  used  to  specify  the  operations  on  coverages  which  correspond  to  the 
operations  in  the  specification.  The  key  step  in  data  refinement  is  to  define  the 
abstraction  invariant  which  specifies  the  set  of  values  which  correspond  to  a  given 
coverage. 

The  first  question  to  be  settled  in  carrying  out  the  data  refinement  is  the  form  for  the 
coverage  measure.  The  form  chosen  must  be  easily  implementable  and  the  check 
not  excessively  time-consuming.  The  essentials  of  the  problem  are  that  for 
constructed  patterns  every  constructor  must  be  accounted  for  and  for  tuple  patterns 
the  individual  elements  must  be  complete.  This  latter  property  is  surprisingly  hard 
to  formalise,  so  it  is  useful  to  have  a  few  test  cases  to  clarify  what  is  actually 
required.  Using  the  datatype  definitions  given  previously,  consider  the  case  of  a 
3-tuple  of  colours.  The  following  sequence  of  patterns  is  complete  and  not  redundant 
in  the  sense  that  each  succeeding  pattern  matches  more  values  and  the  complete  set 
of  values  is  only  accounted  for  with  the  final  pattern. 

Number  of  extra  values  matched 


(red,  blue,  green)  1 

(x,  blue,  green)  2 

(red,  X,  green)  2 

(red,  blue,  x)  2 

(red,  X,  y)  4 

(blue,  green,  red)  1 

(blue,x,y)  7 

(X,  green,  green)  1 

(x,  red,  green)  1 

(X,  y,  z)  6 


It  is  worth  while  checking  this  table  to  convince  yourself  that  if  the  concept  of 
exhaustive  patterns  is  intuitive,  the  actual  check  to  apply  in  the  tuple  case  is  not.  In 
order  to  handle  tuples  generally,  the  coverage  measure  adopted  will  involve  a 
function  from  the  coverage  provided  by  the  first  element  of  the  tuple  to  the  coverage 
provided  by  the  remainder  of  the  tuple,  rather  than  having  a  tuple  of  coverages.  This 
will  become  clearer  as  the  formalism  is  developed. 


The  other  complication  in  the  problem  is  that  it  is  necessary  to  account  for 
constructor  functions  as  well  as  constants  in  the  patterns.  Thus  for  a  pair  of  labels  one 
has  the  following  sequence  of  exhaustive  but  not  redundant  patterns; 


(code(red),  x) 

(x,  number(0,  y)) 

(code(x),y) 

(x,  name(y)) 

(X,  y) 

It  is  possible  to  treat  constructed  patterns  as  though  they  were  2-tupIes,  using  a 
function  in  exactly  the  same  way  as  for  tuples.  This  approach  has  not  been  taken  in 
the  implementation  presented  here,  partly  for  reasons  of  efficiency  and  partly  for 
ease  of  understanding.  Instead,  a  function  from  constructors  to  coverages  is  used. 

Special  constants  are  treated  by  having  an  incomplete  coverage  measure,  which 
represents  an  empty  set  of  values,  corresponding  to  the  fact  that,  in  the  specification, 
a  special  constant  pattern  matches  no  value.  Treating  the  special  constants 
accurately  would  require  keeping  a  measure  dependent  on  the  number  of  constants 
already  accounted  for. 

With  this  motivation  the  Z  datatype  defining  the  coverage  measure  may  be  written 
down: 

Cover  complete  |  incomplete 

I  construct «  CON  Cover  »  |  pair  <(  F  {Cover  x  Cover)  » 

Thus  the  coverage  measure  chosen  is  a  tree  in  which  the  leaves  correspond  to  full  or 
no  coverage  and  the  nodes  of  the  tree  correspond  to  the  constructor  and  tuple 
structure  of  the  type.  The  parameter  of  pair  is  given  as  a  set  of  pairs  rather  than  a 
function  as  this  seems  to  make  the  explanation  easier.  Finite  sets  are  used  to  ensure 
that  the  datatype  is  satisfiable.  Before  going  on  to  define  the  abstraction  function,  it 
may  be  mentioned  that  the  sequence  of  patterns  compiled  also  forms  a  coverage 
measure.  However,  it  is  worth  having  a  separate  coverage  datatype  in  order  to 
optimise  the  operations  required. 

The  formal  definition  of  the  abstraction  invariant  will  be  motivated  by  giving  an 
example  of  the  intended  relation  between  coverages  and  patterns.  The  first  two 
patterns  from  the  sequence  of  label  pairs  above  are  to  be  represented  by  the  following 
two  coverages; 

pair  {construct  {code  construct  {red  ^  complete}}  complete} 
pair  {complete  construct  {number  ^  pair  {incomplete  complete}}} 

The  essence  of  the  problem  is  to  combine  successive  coverages  into  one  in  such  a 
way  as  to  end  up  with  the  complete  cover  when  the  set  of  patterns  is  exhaustive. 

For  the  presentation  of  the  abstraction  invariant  some  functions  for  manipulating 
tuple  types  are  necessary.  From  a  tuple  type  one  can  form  a  type  corresponding  to 
the  first  element  of  the  tuple  and  one  corresponding  to  the  remainder  of  the  tuple, 
given  by  two  functions  HD  and  TL  as  follows: 


Note  that  these  functions  deliver  a  type  rather  than  an  arbitrary  set  of  values  because 
successive  elements  of  values  in  a  tuple  type  must  have  the  same  type. 

The  basic  requirement  for  the  abstraction  is  a  function  from  a  Cover  value  to  a  set  of 
Value.  This  cannot  be  provided  from  the  datatype  chosen  because  the  set  of  values 
from  a  complete  cover  will  depend  on  the  type.  However,  the  actual  check  to  be 
made  is  independent  of  the  type  in  this  case,  so  the  specification  contains  redundant 
information  as  far  as  these  particular  checks  are  concerned.  (This  state  of  affairs  is 
called  bias.)  It  is  still  possible  to  carry  out  the  refinement,  but  in  order  to  do  so,  it  is 
necessary  to  have  a  function  from  a  Cover  and  a  Type  to  a  set  of  Value.  A  unique 
definition  of  this  function  will  be  provided,  built  up  according  to  the  structure  of  Cover 
and  Type  in  the  usual  way. 

Abs  Jn  :  (Cover  x  Type)  -v*  P  Value 
For  the  primitive  coverage  elements  we  have  the  following 

_  AlPrimitive _ , 

c  .•  Cover;  type  :  Type;  vats  :  P  Value 

(c  =  complete  a  vals  =  type)  v  (c  =  incomplete  a  vals  =  {}! 

For  constructed  coverages  the  cover  represents  any  constructed  value  it  is  possible 
to  form  from  a  constructor  in  the  domain  of  the  constructor  function  combined  with  a 
value  drawn  from  the  set  represented  by  the  corresponding  coverage  clement  in  the 
range  of  the  function.  This  is  expressed  by  the  following  schema: 


_  A/Constructions _ 

I  c  :  Cover;  type  :  Type;  vals  ;  P  Value 


c  e  ran  construct  a  type  c  ran  consv 
vals  =  {Cons  val;  F  :  CON  Cover 
I  c  =  construct  F  a  con  €  dom  F 
A  val  e  AbsJn(F  con,  type _ofj:on(con)) 
•  consv  QCons_val 
} 


For  tuples  we  need  a  function  from  pairs  of  coverages  to  sets  of  values  as  follows: 
-  ■  . .  ,  - - 

Tuple _yals  :  {Cover  x  Cover  x  Type)  -w  P  tuple  Value 


V  C],  C2  :  Cover;  type  ;  Type;  vals  :  P  tuple  Value 
I  type  Q  ran  tupv 
*  Tuple _vals(c j,  C2,  type)  =  vals 

Size  type  =  2  a  vals  =  { ;  Abs  Jn(c j,  HD  type);  V2  :  Abs  Jn{C2,  TL  type) 

•  <V;,V2> 

} 

V 

Size  type  >  2  a  vals  =  {vy ;  AbsJn(Cj,  HD  type);  pcj,  pc2  :  Cover; 

tv  :  tuple  Value 

\pcjy^pc2  e  pair~^C2 

A  rv  €  Tuple _vals(pcj,  pc2,  TL  type) 

•  V j  cons  tv 

} 


The  set  of  values  corresponding  to  a  pair  of  covers  is  that  set  of  tuple  values  formed 
by  taking  any  element  from  the  set  corresponding  to  the  first  cover  for  the  first 
element  and  combining  it  with  any  element  from  the  set  corresponding  to  the 
second.  Thus  if  the  first  cover  corresponds  to  n  values  and  the  second  to  m,  the  pair  of 
covers  corresponds  to  nx  m  tuple  values.  For  longer  tuples,  the  second  cover  value 
will  itself  be  a  pair  giving  rise  to  a  set  of  tuple  values  whose  size  is  one  less  than  the 
original.  Form  a  set  by  talcing  any  element  from  the  values  corresponding  to  the  first 
cover  and  add  this  to  the  front  of  any  element  in  the  tuples  provided  by  the  second 
cover. 

With  this  auxiliary  function  the  abstraction  function  for  tuples  may  be  defined  as 
follows: 
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ATTuples  _ 

c  :  Cover;  type  :  Type;  vals  ;  P  Value 


c  e  ran  pair  a  type  c  ran  tupv 

vals  =  U  {c j,  C2  ■'  Cover  J  C2  e  pair~^  c  •  tupv  I  Tuple_vals(Cj,  C2,  type)  11  } 

The  abstraction  function  itself  is  simply  given  by  the  constraint: 

Abs  J'n  =  “kc  ;  Cover;  type  ;  Type 

•  [j,  vals  :  P  Value  1  AlPrimitive  v  AlConstructions  v  ATT uples  *  vals 

Finally,  it  is  convenient  to  define  the  abstraction  invariant  as  a  schema: 

^AI - , 

c  ;  Cover;  type  ;  Type;  vals  ;  P  Value 

vals  =  Abs  Jn(c,  type) 

_ I 

Zjnatch_Al  keeps  Cover,  complete,  incomplete,  construct,  pair, 

HD,  TL, 

Al,  Abs  Jn,  AlPrimitive,  AlConstructions,  ATT uples 


4  The  refined  operations 

4.1  The  initial  operation 


,  Zjnatchjspec  .Module  < 
! Zjnatch_Al  .'Module'; 


I;  is  interesting  to  follow  the  technique  given  in  Morgan  [1988]  in  which  the  concrete 
operations  corresponding  to  the  abstract  operations  are  actually  calculated  from  the 
abstraction  invariant.  To  summarise  the  notation,  which  will  be  slightly  adapted 
from  that  used  by  Morgan,  an  operation  is  represented  by  its  pre  and  pHDSt-conditions 
as  below: 


A  av  [pre,  post] 

This  represents  an  operation  achieving  a  state  of  affairs  specified  by  the  predicate  post.  It 
must  be  given  an  initial  state  represented  by  pre  and  achieves  it  by  altering  the  abstract 
variable  av. 

Given  an  abstraction  invariant  AI,  involving  the  concrete  variable  cv,  the  corresponding 
concrete  operation  is  simply  given  by  the  formula 

Acv{3  av*  AI  A  pre,  3av*  Al  a  post] 

This  technique  will  be  applied  first  of  all  to  the  operation  Init_op.  In  the  refinement 
notation  we  can  write; 

Initjop  K  A  vals  [par  pattern _compatible  type,  vals  =  valcover(par,  type)] 

This  statement  is  an  assertion  that  the  operation  on  the  right  hand  side  of  the  c  symbol  is 
an  operation  refinement  of  the  Z  schema  operation  Initjop.  In  the  refinement  notation, 
operations  are  expressed  in  terms  of  variables  assumed  to  be  declared  within  the 
current  context  of  the  operation.  Variables  in  the  pre-condition  refer  to  values  before 
the  operation  while  variables  in  the  post-condition  refer  to  values  after. 
Consequently,  there  is  no  need  for  the  Z  decorations  of  !,  ?  or  '  and  these  are 
systematically  dropped.  The  pre-condition  for  the  operation  has  been  derived  by 
existentially  quantifying  over  the  output  variables  in  the  Z  schema  and  simplifying. 

For  the  data  refinement,  the  concrete  variable  is  the  coverage,  c,  of  type  Cover  while  the 
abstract  variable  is  the  set  of  values  vals,  of  type  IP  Value.  The  concrete  operations  will 
eventually  prove  to  be  independent  of  the  type  which  may  simply  be  discarded. 
Using  the  abstraction  invariant  and  the  data  refinement  symbol  our  operation  is 
refined  as  follows: 

:5  A  c  [3  vals  :  P  Value  •  AI  a  par  patternjeompatible  type, 

3  vals  :  P  Value  •  AI  a  vals  =  valcover(par,  type)] 
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This  operation  is  guaranteed  to  be  a  correct  data  refinement  of  Iniijop.  Looking  at  the 
post-condition,  it  is  clear  that  it  is  necessary  to  calculate  a  coverage  from  the  input 
pattern.  This  coverage  must  be  such  that  the  application  of  the  abstraction  function 
gives  the  same  set  of  values  as  that  provided  by  valcover  when  applied  to  the  input 
pattern.  It  is  fairly  obvious,  once  one  has  stood  back  from  the  trees  of  the  formalism 
to  view  the  wood  of  the  problem,  that  the  initial  value  of  c  is  irrelevant.  Consequently 
there  is  no  problem  incurred  in  weakening  the  pre-condition  and  simplifying  the 
post-condition  by  substituting  for  vals  as  follows 

c  A  c  [par  pattern _compatible  type,  Abs  Jn{c,  type)  =  valcover(par,  type)] 

The  implementation  problem  therefore  is  to  define  a  coverage  function  which  relates  a 
Pattern  to  a  Cover  in  the  manner  required: 

coverage  :  Pattern  — >  Cover 

For  this  to  be  a  correct  implementation  of  the  initial  operation,  the  following 
theorem  has  to  be  proved; 

Pattern  jCompatible 

V 

Abs  Jn(coverage(p),  type)  =  valcover(p,  type) 

The  specification  of  coverage  will  be  deferred  to  a  later  section  because  the  other 
operations  introduce  funher  constraints  on  its  definition. 

4.2  The  checking  operation 

Proceeding  in  the  same  way  as  before 

Check _op  c  con  valsg  •  A  vals,  r  [vals  =  valsQ  a  par patternjcompatible  type 

A  vals  Q  type,  Checkjop] 

This  step  has  introduced  some  more  refinement  notation.  Where,  as  in  this  case,  the 
post-condition  is  dependent  on  the  values  both  before  and  after  the  operation,  it  is 
necessary  to  preserve  the  initial  value  using  a  logical  constant,  which  is  introduced  with 
the  reserved  word  con.  By  convention,  initial  values  are  indicated  with  a  0-subscript, 
so  voIsq  corresponds  to  the  input  value  vals?  in  the  schema. 

Note  that  in  this  specification  the  pre-condition  does  not  record  the  fact  that  the 
input  values  are  provided  from  the  results  of  the  initial  operation  or  a  previous  use  of 
the  checking  operation  as  the  case  may  be.  This  will  be  introduced  informally  later. 
The  data  refinement  for  the  checking  operation  can  be  written  as: 


■< con  vuIsq,  Cq»  ^  c,r[5  vals  :  P  Value 

•  AJ  A  \als  =  valsQ  a  c  =  Cq 
A  par  pattern_compauble  type  a  vals  c  type, 

3  vals  :  P  Value  •  AI  a  Check_op] 

Because  Abs _fn(c,  type)  is  always  a  subset  of  type,  the  pre-condition  may  be  replaced 
by  valsQ  =  AbsJn(CQ,  type)  a  par  pattern  joompatible  type. 

The  post-condition  may  be  simplified,  by  eliminating  vals  and  valsQ,  into  a  predicate 
described  by  the  following  schema; 

_  Check_op_I  _ 

c,  Cq  :  Cover 
par  :  Pattern 
type  :  Type 
r :  Result 

par  pattern  jcompatible  type 

Abs  Jn(c,  type)  =  valcover(par ,  type)  u  Abs Jn(CQ,  type) 

valcover(par,  type)  ^  {}  a  Abs  Jn(c,  type)  =  Abs Jn(CQ,  type) 

A r= REDUNDANT 

V  (valcoverlpar,  type)  =  {}  v  Abs Jn(c,  type)  Abs Jn(CQ,  type)) 

Ar  =  OK 


We  already  have  the  requirement  to  find  a  coverage  function  such  that 
valcover(par? ,  type)  =  AbsJn(coverage(par?),  type),  so  it  is  tempting  to  define  a  union 
function  between  coverages  which  carries  out  the  corresponding  operation  to 
forming  a  union  of  sets  of  values; 

union  :  (Cover  x  Cover)  -+>  Cover 
and  refine  to  the  operation 

_  Check_op_2  -  ^ 

c,  Cq  :  Cover 
par :  Pattern 
type :  Type 
r :  Result 

c  =  union( cover age(par),  Cq) 

coverage(par)  *  incomplete  a  c  =  Cq  a  r  =  REDUNDANT 
V  ( cover age(par)  =  incomplete  v  ci^  Cq)  a  r  =  OK 


For  this  to  be  so,  the  following  theorem  must  be  proved: 

Check__op_2 

V 

Check_op_l 

This  corresponds  to  strengthening  the  post-condition,  namely,  that  the  achievement 
of  Check_op_2  will  entail  the  achievement  of  Check_op_l .  Note  that  the  type  has  now 
dropped  out  of  the  predicates. 

For  the  proof  of  this  theorem,  it  is  necessary  to  show  that  the  union  function  behaves 
like  set  union  and  that  there  is  a  unique  representation  of  the  empty  set  of  values.  It 
is  difficult  to  define  a  function  having  the  properties  required  without  taking  into 
account  the  fact  that  the  coverages  have  all  ^en  derived  from  patterns  of  the  same 
type.  Consequently,  extra  constraints  will  be  added  to  the  specification  which  are 
satisfied  by  the  pre-condition.  These  constraints  will  be  defined  in  terms  of  a 
cover jcompatible  relation,  and  express  the  fact  that  the  union  function  is  only  required 
to  combine  coverages  of  the  same  structure. 

This  is  defined  in  a  similar  way  to  pattern  jcompatible: 

jcover jcompatible _  :  Cover  Type 

CCPrimitive _ i 

c  .  Cover;  type  :  Type 

c  =  complete  v  c  =  incomplete 

...  I 

CCConstructions  _ i 

c  .•  Cover;  type  :  Type 

c  e  ran  construct  a  type  Q  ran  consv 
dom  F  c  {Consjval  |  consv(QCons_val)  €  type  •  con} 

V  con  :  dom  F  *  (F  con)  cover  jcompatible  (type  jofjton  con) 
where 

F  ==  construcr'  c 


CCTuples  _ 

c  :  Cover;  type  :  Type 


c  €  ran  pair  a  type  c  ran  tupv 

V  c  :  dom  F  •  c  cover  jcompatible  HD  type 

V  c  ;  ran  F  •  c  cover  jcompatible  TL  type 
where 

F  =  =  pair~^  c 


V  c  ;  Cover;  type  ;  Type 

•  c  cover  jcompatible  type  ^  CCPrimitive  v  CCConstructions  v  CCT uples 

In  the  refined  operation,  the  compatibility  of  the  coverage  c  with  the  type  is  guaranteed 
by  the  compatibility  of  the  pattern  par  with  the  type  while  the  fact  that  Cq  is  compatible 
is  guaranteed  by  the  initial  set  of  values  in  the  abstract  operation  being  generated  by 
matching  type-compatible  patterns.  This  is  all  rather  tedious  to  formalise  and  not 
very  illuminating,  so  these  theorems  will  not  be  stated.  The  theorems  expressing  the 
more  interesting  union  and  uniqueness  properties  may  be  broken  down  into  sub-goals 
defined  in  terms  of  the  following  schema,  which  gathers  together  the  parameters  of 
union  and  its  result,  and  has  the  type  as  a  parameter: 

_  Union _ , 

Cj,  C2,  c  ;  Cover 
type  ;  Type 

c  =  union(c  j,  C2) 
c  j  cover  jcompatible  type 
C2  cover  jompatible  type 


The  first  goal  is  related  to  the  first  constraint  of  Check  jp_2: 

Union 

h 

Absjn(c,  type)  =  Abs J'n(Cj,  type)  u  Absjn(c2,  type) 
The  remaining  goals  are  related  to  the  second  constraint: 

Pattern  jCompatible 

h 

coveragelp)  *  incomplete  =>  valcover(p,  type) 

Union 

k 

c  =  C2  Abs  Jn(c,  type)  =  Abs  Jn(C2,  type) 


Union 

Y 

c  ^  C2=^  Absjn(c,  type)  ^  Abs  Jn(c2,  typ^) 

The  first  of  these  goals  requires  the  coverage  function  to  deliver  the  incomplete  value 
whenever  the  value  coverage  is  empty.  The  second  is  trivially  true  and  the  third 
requires  the  result  of  the  union  function  to  change  whenever  the  set  of  values  covered 
changes.  Further  constraints  on  union  will  emerge  with  the  definition  of  the  final 
operation. 

4.3  The  final  operation 

This  performs  the  final  check  for  whether  the  set  of  patterns  is  exhaustive  or  not. 
The  operation  is  simply  refined  as  follows: 

Final _op  e  A  r  [true,  vals  =  type  a  r  =  OK  v  vals  ^  type  a  r  =  INCOMPLETE] 

■<  A  r  [true,  c  =  complete  a  r  =  OK  v  c  ^  complete  a  r  =  INCOMPLETE] 

Here  the  data  refinement  and  simplification  of  the  pre  and  post  conditions  have  been 
earned  out  in  one  step.  The  input  to  the  operation  is  simply  the  result  of  union,  or,  in  the 
case  of  a  single  clause  in  the  function  definition,  the  result  of  coverage.  Accordingly, 
there  are  further  constraints  to  satisfy.  These  are: 

Union 

I- 

c  -  complete  Absjn(c,  type)  =  type 

Union 

Y 

c  ^  complete  Abs  Jn{ c,  type)  ^  type 

p  :  Pattern;  type  :  Type;  c  ;  Cover 

Y 

c  =  cover age(p)  =  complete  AbsJ'n(c,  type)  =  type 

p  ;  Pattern;  type  :  Type;  c  ;  Cover 

Y 

c  =  coverage(p)  ^  complete  =»  Abs  Jn{c,  type)  *  type 

The  first  and  third  of  these  goals  follow  immediately  from  the  definition  of  the 
abstraction  function,  while  the  second  and  fourth  again  require  a  unique 
representation  of  completeness. 

Z_match_ops  keeps  coverage,  union.  Union, 

cover _compatible,  CCPrimitive,  CCConstructions,  CCT uples 


5  Abstract  algorithm  design 


I  Zjnatch_spec  .Module 

!  ] 

ZjnatchjM  .Module  j 

Zjnatch_ops  :Module  j 

5.1  The  coverage  function 

We  have  to  provide  a  constructive  definition  of  the  function  which  satisfies  the 
following  theorems; 

Pattern  JOompatible 

t- 

Ahsjni cover Qgeip),  type)  -  valcover(p,  type) 

Pattern  jCompatible 

h 

coverage(p)  *  incomplete  ^  valcover(p,  type)  {} 

P alter njOompatible;  c  :  Cover 

y 

c  =  coverage(p)  =  complete  Abs  Jn(c,  type)  =  type 

Patter njCompatible;  c  :  Cover 

y 

c  =  coverage(p)  ^  complete  Abs_fn(c,  type)  ^  type 

The  first  of  these  gives  the  basic  propeny  required,  the  second  the  property  that  the 
empty  value  coverage  is  uniquely  represented  by  the  incomplete  cover  and  the  last  two 
that  the  complete  cover  uniquely  represents  the  exhaustive  set  of  patterns.  Note  that 
the  third  of  these  theorems  follows  immediately  from  the  definition  of  Absjn.  With 
these  theorems  in  mind,  the  definition  of  coverage  may  be  written  down  by  cases  on  the 
structure  of  Pattern,  following  the  definition  of  matches: 

For  the  primitive  patterns  of  variables  and  special  constants  we  have 

_  CPrimitive _ , 

p  :  Pattern;  c :  Cover 

p  =  unitp  A  c  =  complete 

V 

p  €  ran  var  a  c  =  complete 

V 

p  e  ran  sconstp  a  c  =  incomplete 

—  -  ■ 
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For  constructed  patterns  a  construct  coverage  will  normally  be  produced  in  which  the 
first  element  corresponds  to  the  constructor  and  the  second  to  the  coverage  provided 
by  the  parameter.  To  satisfy  the  second  theorem  above,  it  is  necessary  to  test  for  an 
incomplete  parameter  coverage  while  to  satisfy  the  fourth  theorem  the  complete  cover 
must  be  returned  when  the  set  of  patterns  is  exhaustive.  For  a  datatype  containing 
only  one  constructor  this  is  the  case  when  the  parameter  provides  a  complete 
coverage  and  this  case  must  be  tested. 

CConstructions _ _ _ 

p  :  Pattern;  c  :  Cover 

p  e  ran  consp 

datatype  con  =  {con}  a  parcover  =  complete  a  c  =  complete 

V 

parcover  =  incomplete  a  c  =  incomplete 

V 

—^(parcover  =  incomplete  v  datatype  con  =  {con}  a  parcover  =  complete)  a 
c  =  construct  {con  parcover} 
where 

con  ==  (consp~^  p). con 
parcover  ==:  coverage  (consp~^  p).patt 

• — - — - -  -  ■  _■  I 

For  tuples,  it  is  also  necessary  to  test  for  complete  and  incomplete  partial  results,  which 
is  done  using  the  CPair  function  as  follow's: 

CPair  =  -  X  c j,  C2  :  Cover 
•  p.  c  :  Cover 

I  Cy  =  complete  a  C2  =  complete  a  c  =  complete 

V 

{Cj  =  incomplete  v  C2  =  incomplete)  a  c  =  incomplete 

V 

-i(cj  =  incomplete  v'  C2  =  incomplete) 

A 

—>(Cj  =  complete  a  C2  =  complete) 

A  c  =  pair{cj  ^  C2} 

•  c 

Note  that  we  are  now  getting  to  the  stage  where  the  if  then  else  notation  of  the 
implementation  language  would  be  more  natural  and  more  compact. 

With  this  function  the  coverage  for  tuples  is  given  by 


_  CTuples  _ 

p  :  Pattern;  c  :  Cover 


p  G  ran  tupp 

^tp  =  2  A  c  =  CP  air  ( cover  age(tp  1),  coverage(tp  2)) 

V 

Utp>2  A  c  =  CPair(coverage(tp  1),  coverage(nipp(tl  tp))) 
where 

tp  ~=  tupp~^  p 


The  coverage  function  is  given  by  the  constraint; 
coverage  =  Xp  :  Pattern 

•  [I  c  ;  Cover  |  C Primitive  v  CConstructions  v  CTuples  •  c 
Note  that  this  function  is  total  as  the  disjunction  covers  all  possible  Pattern  constructors. 

5.2  The  union  function 

As  with  coverage,  the  union  function  must  satisfy  the  following  theorems: 

Union 

I- 

Ahsjnic,  type)  =  AbsJn(Cj,  type)  u  Abs Jn(c2,  type) 

Union 

V 

c  =  C2^  Absjn(c,  type)  =  Absjn(c2,  type) 

Union 

V 

c*C2^  Abs  Jn(c,  type)  *  Absjn(c2,  type) 

Union 

V 

c  =  complete  =»  Abs  J'n(c,  type)  =  type 


Union 

I- 

c  *  complete  =»  Absjn(c,  type)  ^  type 

The  principal  objective  is  to  make  the  union  function  correspond  to  the  operation  of 
uniting  sets  of  values.  The  additional  constraints  are  that  the  coverage  must  only 
change  when  the  underlying  sets  of  values  change  and  that  it  is  necessary  to  have  a 
unique  measure  for  the  exhaustive  set  of  patterns. 


For  the  abstract  algorithm  design,  consider  first  the  case  of  a  set  of  constructed 
patterns.  These  are  represented  by  a  construct  coverage  which  measures  the  coverage 
of  values  associated  with  each  of  the  constructors  in  the  datatype.  So  the  coverage 
will  be  represented  by  some  function,  F,  of  the  form  F  =  {Cj  F,},  where  the  c,-  are  the 
constructors  of  a  datatype  and  the  represent  sets  of  parameter  values  covered  so  far. 
Adding  a  new  pattern  will  give  rise  to  an  extra  coverage  represented  in  the  same 
way  by  the  maplet  Cy  F.  The  new  coverage,  F',  will  depend  on  whether  Cj  is  a  member 
of  thf*  domain  of  F  or  not.  If  it  is,  the  coverage  provided  by  the  parameter  of  the  new 
pattern  is  united  with  the  coverage  provided  by  the  parameters  of  the  patterns 
already  processed  which  use  that  constructor.  This  is  expressed  formally  as 
F' =  F  0  {cy  Fy  u  F}.  When  a  new  pattern  introduces  a  constructor  for  the  first 
time,  the  coverage  is  simply  added  to  those  already  there.  In  this  case, 
F'  =  Fu  {Cy^^F}. 

Calculating  the  united  coverage  in  this  way  will  cause  F'  to  differ  from  F  exactly  when 
a  new  constructor  is  added  to  the  coverage  or  when  a  parameter  coverage,  pare,  changes. 
This  satisfies  the  requirement  of  the  third  goal  for  this  case.  For  the  fifth  goal,  a  test 
for  completeness  is  required,  expressed  as  follows: 

_  Constructor _complete  _ , 

F' :  COhl Cover 


3  cs  :  ran  datatype  •  F'  =  {c  ;  cs  ♦  c  complete} 


Using  this  schema  and  generalising  to  cover  the  case  of  merging  sets  of  patterns,  the 
union  constructor  case  is  defined  as  follows: 

^  U Constructions  _ , 

Cj,  C2,  c  :  Cover 

Cj  e.  ran  construct  c.  C2  ^  ran  construct 
Constructor  joomplete  a  c  =  complete 

V 

—iConstructor_complete  a  c  =  construct  F' 
where 

F  =  =  construcr^  c  j 
/==  construct'^  C2 
fj  ==  domf  i  F 
f2  ==  dom  F  Af 

fy  ==  {c  dom  F  n  dom  f*c^  union(F  c,f  c}} 

F'==(fjUf2Ufj) 

I  ' 


This  is  fairly  straightforward,  but  the  same  technique  may  be  used  to  deal  with  tuple 


patterns,  which  are  represented  by  pair  coverages.  In  this  case,  one  coverage  stands  for 
the  set  of  values  covered  in  the  first  element  of  the  tuple,  which  is  bound  to  the  set 
of  values  covered  by  the  rest  of  the  tuple  in  a  similar  way  to  that  in  which  the 
parameter  of  a  constructed  pattern  is  bound  to  its  constructor.  Consequently  a  pair 
coverage  stands  for  a  set  of  values  which  may  be  represented  in  the  form  F  =  B^}, 

where  the  As  and  now  stand  for  sets  of  values.  Each  element  of  the  set  F  stands  for  a 
set  of  tuple  values  formed  by  taking  one  element  out  of  an  A  and  combining  it  with  any 
element  out  of  the  corresponding  B.  A  new  pattern  gives  rise  to  a  set  of  values  covered 
of  the  form  a  p.  For  a  given  element  of  F,  say  A  the  standard  rules  for  taking 
unions  of  Cartesian  products  should  give  rise  to  the  following  extra  elements  in  F': 

A\a^B 
A  n  a  fi  u  P 
a\ A  P 

Applying  this  procedure  to  every  element  in  F  gives  the  new  coverage,  F'.  If  any  of  the 
sets  are  empty,  this  element  represents  no  values  and  may  be  discarded.  To  meet  the 
other  goals  it  is  ^mponant  that  F'  should  differ  from  F  exactly  when  new  values  have 
been  added  to  the  set  of  values  covered.  This  objective  is  attained  by  keeping  the  A^ 
disjoint:  an  alteration  to  one  of  them  cannot  then  produce  a  value  which  is  already 
accounted  for  by  the  other  members  of  F.  If  this  is  the  case,  and  if  the  first  two 
operations  above  correspond  to  the  introduction  of  new  values,  that  is,  if  J5  u  p5*F, 
alterations  must  correspond  to  new  values.  If  all  the  Ay  in  F  are  disjoint,  elements 
formed  by  these  operations  will  also  be  disjoint.  However,  the  third  operation  may 
give  rise  to  intersections,  but  these  may  be  eliminated  by  adding  one  element 
(a  \  U  (dom  F))  P  to  the  function  as  a  whole,  rather  than  carrying  out  the  operation 
for  each  element. 

As  an  example  of  this  process,  consider  the  following  sequence  of  patterns: 

(red,  y) 

(green,  blue) 

(x,  red) 

(green,  green) 

(blue,  y) 

These  patterns  match  the  following  pairs  of  values: 

red  ^  colour 
green  blue 
colour  red 
green  green 
blue  colour 

Uniting  the  pairs  according  to  these  rules  gives  successively: 
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{red  colour,  green  blue} 

[red  colour,  green  ^  [red,  blue],  blue  red] 
{{red,  green}  ^  colour,  blue  red} 
colour  colour 


In  formalising  this  process  it  is  necessary  to  specify  operations  representing  the 
difference  and  intersection  of  sets  of  values,  just  as  it  is  already  required  to  form 
unions.  For  differences  and  intersections,  it  is  necessary  to  distinguish  the  empty  set, 
which  corresponds  to  the  incomplete  cover.  The  difference  and  intersection  functions 
are  both  defined  recursively  and  their  types  are  given  by: 

difference,  intersection  :  (Cover  x  Cover)  -O  Cover 


Difference  and  intersection  of  tuples  involves  sets  of  pairs  of  differences,  so  it  useful 
to  define  some  schemas  to  provide  the  necessary  signatures  as  follows: 


_  Pairjelement _ 

Aj,  A2,  CLj,  a2  :  Cover 
res  j)airs  :  P  (Cover  x  Cover) 


Pair  set _ 

F,  result _pairs  :  P  (Cover  x  Cover) 
aj,  a2 :  Cover 


In  these  schemas,  Aj^^  A2^  F  will  be  transformed  into  res _pairs  as  a  result  of  adding 
a  pair  coverage  element  a2.  Applying  this  process  to  aU  elements  of  F  gives  a  new 
set,  called  rg5u/r  j)airs. 


The  intersection  function  is  the  simplest:  for  one  element  of  a  pair  set  the 
intersection  is  given  by: 


_  Intjelement _ , 

Pairjelement 

res  jjairs  =  {j:  ^^y} 
where 

x==  intersection(Aj,  a^) 
y  ==  intersection(A2, 0.2) 


The  set  of  pairs  is  obtained  by  adding  together  all  the  elements  and  discarding  any 
that  are  empty: 


Int  _pair 
Pair  set 


result jjairs  =  {Intjelement;  x,  y  :  Cover 
\Ajy^A2e  F 

/\x^y  e  res  _pairs  ax*  incomplete  Ay  *  incomplete 
•x^y 
} 


With  this,  the  definition  of  intersection  can  be  given  by  cases  on  the  structure  of  Cover 
as  follows: 

r /Primitive _ , 

c j,  C2,  c  :  Cover 


(Cj  =  complete  a  c  =  C2)  v'  (c  j  =  incomplete  a  c  =  incomplete) 

V 

(C2  =  complete  a  c  =  Cj)  s/  (C2  =  incomplete  a  c  =  incomplete) 


A  complete  cover  accounts  for  all  values  and  so  the  result  is  unchanged.  An 
incomplete  cover  corresponds  to  the  empty  set  which  cannot  have  an  intersection 
with  any  set  of  values. 

For  constructed  covers,  the  intersection  is  determined  by  whether  the  constructors 
are  equal  and  if  so,  whether  the  parameter  values  intersect: 

r /Constructions  _ , 

c j,  C2,  c  :  Cover 


C]  e  ran  construct  a  C2  e  ran  construct 

F'  =  {}  A  c  =  incomplete 

V 

F' ^  A  c  =  construct F' 
where 

F  ==  construcr^  c J 

F'  ==  {c  :  dom  F;  pare,  int :  Cover 
I  c  ^  pare  €  construcr^  C2 
A  int  =  intersection(F  c,  pare)  *  incomplete 
•  c  int 
} 


For  tuples,  any  member  of  the  pair  coverages  may  intersect: 
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^  ITuples _ , 

c j,c 2,  c  :  Cover 

Cj  G  ran  pair  a  C2  e  ran  pair 
F'  =  {}  A  c  =  incomplete 

V 

{}  A  c  =  pair  F' 

where 

F  ==  pair~^  c j 

F’  ==  U  {Int _pair  j  tt;  a2  G  pair~^  C2  •  result _pairs} 


Finally,  the  intersection  function  is  given  by  the  constraint: 

intersection  =  X  Cj,  c 2  ■'  Cover 
•  p.  c  ;  Cover 

I  IPrimitive  v  IConstructions  v  FTuples 

•  c 


The  difference  function  is  similar,  except  that  for  tuples,  the  difference  of  cartesian 
product  sets  is  slightly  more  complicated.  On  one  element  of  a  pair  set,  the 
difference  function  carries  out  the  following  operation: 

Diffjelement _ , 

Pair  element 


res _pairs  =  A 2,  z^y) 

where 

x  ==  differ  ence( A 
y  ==  difference(A2,  ^2) 
z  =  =  intersection(Aj,  aj) 


For  the  pair  relation: 

Diffjjair  _ 
Pair  set 


result  j>airs  =  {Diffjelement;  x,  y  :  Cover 
\Aj^A2^  F 

A  j:  y  €  res  jjairs  a  x  ^  incomplete  Ay#  incomplete 
*x>^y 
} 


The  definition  of  difference  is  now  given  by  cases  on  the  structure  of  Cover  as  follows: 
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_  DPrimitive  _ _ 

c j,  C2,  c  :  Cover 

Cj  =  incomplete  a  c  =  incomplete 

C2  =  complete  a  c  -  incomplete  'v  C2  =  incomplete  a  c  =  Cj 


The  constraints  express  the  fact  that  subtracting  from  the  empty  set  must  leave  the 
empty  set  as  should  subtracting  the  complete  set  of  values.  Subtracting  the  empty 
set  must  leave  the  result  unchanged. 

For  constructors,  the  new  constructor  coverage  is  obtained  from  the  old  one  by 
modifying  the  coverage  function  where  it  has  elements  in  common  with  the 
subtrahend: 

rDConstructions  _ , 

c  j,  C2,  c  :  Cover 


CjS.  ran  construct  a  c  2^  ran  construct 
f '  =  {}  A  c  =  incomplete 

V 

F' ^  A  c  =  construct F' 
where 

F  =  =  construct'^  Cj 

F'  ==  {c  :  dom  F;  pare,  diff :  Cover 
I  c  pare  €  construct'^  C2 
A  diff  =  difference(F  c,  pare)  ^  incomplete 

V 

c  C  dom( construct'^  02)  a  diff  -  F  c 
•  c^  diff 
} 


When  both  coverages  are  pairs  the  difference  is  given  by: 

_  DTuples _ 

Cj,  C2,  c  :  Cover 


C]€  ran  pair  a  C2  €  ran  pair 
F'  =  { }  A  c  =  incomplete 

V 

F'  ^  {  }  A  c  =  pair  F' 
where 

F  =  =  pair"^  c  j 

F'  =  =  U  {Diff _pair  |  a2  e  pair~^  C2  •  result jtairs} 


When  Cj  is  complete,  but  C2  is  not,  it  is  necessary  to  expand  Cj  into  the  appropriate 
representation  of  completeness.  For  constructors,  we  have  the  following: 


^  cl _complete_c2_constructor _ , 

c j,  C2,  c  :  Cover 

Cj  =  complete  a  C2  g  ran  construct 
F'  =  {  }  A  c  =  incomplete 

V 

F'  {  }  A  c  =  construct  F' 
where 

conset  ==  p.cs  :  ran  datatype  |  dom(construct~^  C2)  cs  •  cs 

F'  ==  {c  .•  conset;  pare,  diff :  Cover 
I  c  ^  pare  e  construcr^  C2 
A  dijf  =  difference( complete,  pare)  ^  incomplete 
•cy^  diff 
} 


and  for  tuples: 

^  cl _complete_c2 _pair _ , 

Cj,  C2,  c  :  Cover 

Cy  =  complete  a  C2  ^  ran  pair 
F'  -  {}  A  c  =  incomplete 

V 

F'  { }  A  c  =  pair  F' 
where 

F  =  =  {complete  complete} 

F'  =  =  U  [Diff _pair  |  ay  a2  €  pair~^  C2  *  result jpairs} 


This  gives  for  the  difference  function: 

difference  =  A.  Cy,  C2  Cover 
•  |i  c  .•  Cover 

I  DPrimitive  v  DConstructions  v  DTuples 
V  cl _complete_c2_constructor  v  cl _complete_c2 _pair 

•  c 


The  union  function  is  similar  in  structure  to  difference,  but  this  time  it  is  also 
necessary  to  take  into  acount  the  unique  representation  of  completeness.  (The 
intersection  and  difference  functions  cannot  generate  a  complete  cover,  although  they 
may  generate  an  incomplete  one.  For  the  union  function,  the  converse  is  true.)  To 


express  completeness,  it  is  necessary  to  define  a  relation  UNION,  corresponding  to  a 
distributed  union  function; 


U  :  seq  Cover  — >  Cover 

U  =  X  sc  :  seq  Cover 
•  \1lC  :  Cover 
1  5C  =  0  A  c  =  incomplete 

V 

sc  ^  0  A  c  =  union(hd  sc,  U(tl  sc)) 

•  c 


_UNION_  ==  {c5  ;  (P  Cover  -,  c  :  Cover 
1  3  5c  ;  seq  Cover 
I  ran  sc  =  cs  a  Itsc  =  lies 
»  c  =  U  sc 
•  c^  cs 
} 

This  is  rather  unsatisfactory  as  UNION  has  had  to  be  defined  as  a  relation,  rather  than 
a  function.  This  is  because  the  definition  depends  on  the  order  in  which  the  covers 
are  united,  specified  by  the  sequence  of  covers,  sc,  above.  For  type-compatible  covers, 
the  result  should  be  independent  of  the  order,  corresponding  to  the  fact  that  set  union 
distributes. 

With  the  UNION  relation,  the  completeness  of  a  set  of  tuples  may  be  expressed  as 
follows: 

_  Union _complete _ , 

c  ;  Cover  -,  F' :  P  (Cover  x  Cover) 

cc  =  complete  a  c  =  complete 

V 

cc  =  incomplete  a  c  =  pair  F' 

V 

cc  *  incomplete  a  cc  ^  complete 
AC-  pair  (completed  ^  F'  u  {cc  complete}) 
where 

completed :  P  Cover;  cc  :  Cover 

completed  =  {c  .•  dom  F'\F'  c  =  complete] 
cc  UNION  completed 


In  this  schema,  if  any  of  the  second  elements  is  complete,  then  the  corresponding 


first  elements  may  be  united.  If  the  result  is  complete  then  the  set  of  pairs  is  also 
complete. 

The  union  function  can  now  be  defined  in  an  analogous  way  to  difference: 

U  nionjelement _ , 

Pair  element 


z  =  A2  res _pairs  =  {A A 2} 

V 

z*  A2  A  res jjairs  =  {x  A2,  y  z} 
where 

X  ==  difference(Aj,  aj) 
y  ==  intersection(A  j.cLj) 
z  ==  union( a2,  A2) 


For  the  pair  relation,  a  constraint  to  express  the  third  difference  operation  for  the  set 
of  pairs  as  a  whole  must  be  added: 

_  Union _pair _ , 

Pair_set 


remainder  =  incomplete  a  result  j)airs  =  pairsl 

V 

remainder  ^  incomplete  a  result _pairs  =  pairs]  u  {remainder  0.2) 
where 

pairsl  :  IP  (Cover  x  Cover);  cf,  remainder  :  Cover 

pairsl  =  [Unionjelement; x,y  :  Cover 

\Ajy^A2^F 

A  X  i->  y  6  res _pairs  a  x  ^  incomplete  Ay  ^  incomplete 
•x^^y 
} 

cf  UNION  domP 
remainder  =  difference( (Xj,  cf) 


With  these  schemas,  fonning  the  union  of  tuples  is  given  by: 


UTuples _ , 

Cj,  C2,  c  :  Cover 

Cj  e  ran  pair  a  C2  €  ran  pair 

Unionjcomplete 

where 

F  ==  pair~^  Cj 

F'  =  =  U  {Union jpair  1  a2  e  pair~^  C2  •  result jjairs} 


The  primitive  coverages  are  easily  dealt  with.  If  the  coverage  is  already  complete  adding 
more  values  does  not  increase  it;  if  the  complete  coverage  is  added,  the  result  can  only 
be  complete',  if  the  incomplete  coverage  is  added,  the  set  of  values  covered  is 
unchanged: 

rUPrimitive  ^ 

Cj,  C2,  c  :  Cover 


(Cj  =  complete  v  C2  =  complete)  a  c  =  complete 

V 

(Cj  =  incomplete  a  c  =  02!  v  (C2  =  incomplete  a  c  ~  Cj) 


And  now  the  union  function  is  given  by 

union  =  X  Cj,  c 2  ■'  Cover 
•pc.'  Cover 

I  UPrimitive  v  U Constructions  v  UT uples 

•  c 

ZjnatchJuns  keeps  CPrimitive,  CConstructions,  CTuples, 
UPrimitive,  UConstructions,  UT uples, 
intersection,  IPrimitive,  IConstructions ,  IT uples, 
difference,  DPrimitive,  DConstructions,  DTiq>les 


6  Proof  opportunities 


Z_match_spec  .'Module  . 

;  Z_match_AI  .'Module  | 

\z_niatch_ops  .Module 
!  Zjnatch Juns  .Modulej 

The  section  title  is  intended  to  convey  the  concept  of  selective  proof:  proof  should 
be  used  to  increase  confidence  in  those  parts  of  the  design  which  need  further 
investigation,  rathe:  than  calling  for  the  proof  of  everything  from  first  principles.  In  a 
specification  of  this  nature,  there  are  two  sorts  of  proof  opportunity,  namely  those 
associated  with  the  consistency  of  the  specification  and  those  associated  with  the 
refinement  process  itself.  In  the  course  of  developing  this  implementation,  the 
opportunity  has  been  taken  to  point  out  various  consistency  proofs  as  the  need  arises. 
For  example,  datatypes  should  be  satisfiable,  p-terms  should  stand  for  uniquely 
existing  values  and  the  use  of  function  arrows  should  be  compatible  with  the 
axiomatic  definition  of  the  functions.  The  proof  requirements  for  consistency  are 
relatively  trivial  and  do  not  add  greatly  to  the  understanding  of  the  problem. 
Consequently  we  shall  concentrate  on  the  proof  opportunities  generated  by  the 
refinement  itself  and  attempt  to  show  the  main  structure  of  the  proofs.  Even  here, 
the  sea  of  theorems  to  prove  is  both  wide  and  deep,  so  we  shall  concentrate  on  one  or 
two  example  cases. 

6.1  Proofs  for  the  coverage  function 

The  main  theorem  to  prove  is 

Pattern  _Compatible 

y 

AbsJn(coverage(p),  type)  =  valcover(p,  type) 

The  theorem  contains  p  and  type  as  a  parameter,  so  it  has  to  be  shown  for  all  values  of 
these  types.  The  constraint  in  the  hypothesis  ensures  that  the  type  is  determined  by 
the  pattern  in  most  cases,  so  the  main  proof  is  by  induction  over  the  structure  of  Pattern, 
with  type  corresponding. 

Establishing  a  theorem  of  the  form  T(p},  where  p  is  a  Pattern  and  S’ some  predicate  on  p, 
requires  establishing  the  following  base  cases: 

y  T(unitp) 

sc  :  SCONST  ►  T(sconstp(sc) 

V  .•  Variable  y  !P(var(v)) 


and  the  following  induction  steps 


Cons _patt;  Tipatt)V  'P(consp(BCons j)att)) 
tp  ;  tuple  Pattern;  V  p  :  ran  tp  •  T(p)  h  T(tupp(tp)) 

For  the  coverage  theorem,  the  base  cases  are  a  consequence  of  the  following 
theorems  whose  proof  is  immediate: 

PCPrimitive;  c  :  Cover;  vals  :  IP  Value;  p  =  unitp 

y 

AlPrimitive  a  CPrimitive  a  vals  =  {v  ;  type  1  MPrimitive} 

PCPrimitive;  c  ;  Cover;  vals  ;  P  Value;  p  e  ran  sconstp 

y 

AlPrimitive  a  CPrimitive  a  vals  =  {v  ;  type  |  MPrimitive} 

PCPrimitive;  c  :  Cover;  vals  ;  P  Value;  p  e  ran  var 

y 

AlPrimitive  a  CPrimitive  a  vals  =  {v  ;  type  |  MPrimitive} 

For  the  induction  steps,  the  coverage  property  required  may  be  expressed  using  the 
following  schema: 

_  Coverage _property _ , 

p  :  Pattern;  type  ;  Type 


Pattern  jOompatible 

AbsJn(coverage(p),  type)  =  valcover(p,  type) 


The  constructor  case  will  then  follow  from  the  theorem: 

p  ;  Pattern;  type  ;  Type;  c  :  Cover;  vals  ;  P  Value 
Cons _patt;  p  =  consp  6 Cons j)att 

PCConstructions;  partype  ;  Type;  partype  =  type_of_con  con 
Coverage_property^p^,/p 

y 

AlConstructions  a  CConstructions  a  vals  =  {v  ;  type  1  M Constructions} 
while  the  tuple  case  follows  from  the  theorem: 
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p  :  Pattern;  type  :  Type;  c  ;  Cover;  vals  ;  IP  Value 
tp  ;  tuple  Pattern;  p  =  tupp  tp 
PCT uples 

tt ;  seq  Type;  tt  =  {i ;  dom  tp»  {tv  :  tupv~^  Itypei  •  rv  /}} 

V  /  .•  dom  tp;  p  :  ran  tp;  type  :  ran  tt 
\p  =  tp  i  A  type  -tti 

•  Coverage  _property 

V 

ATTuples  A  CTuples  a  vals  =  {v  ;  type  |  MT uples) 

In  both  of  these  theorems,  the  hypothesis  list  makes  use  of  the  fact  that  the  type  in 
the  induction  hypothesis  must  be  derived  from  the  given  type,  which  is  compatible 
with  the  pattern.  This  is  expressed  by  the  following  theorems: 

p  :  Pattern;  type  ;  Type 

Cons jjatt;  p  =  consp  QCons _patt 

Pattern_Compatible^p^^^,pp^^p^i^p^^ 

¥ 

partype  -  type_of_con  con 

p  :  Pattern;  type  ;  Type 
tp  :  tuple  Pattern;  p  =  tupp  tp 
tt :  seq  Type 

V  i  .•  dom  tp;  p  :  ran  tp;  type  ;  ran  tt 
\p  -tpi  A  type  =  tt  i 

•  Pattern  JCompatible 

h 

tt  =  {/  .■  dom  tp  *  [tv  :  tupv~*  itypei  *  fv  i}} 

The  proof  of  the  constructor  case  may  be  carried  out  along  the  following  lines.  The 
conclusion  consists  of  a  conjunction  of  three  predicates,  each  of  which  must  be 
shown  to  be  true.  The  second  provides  a  value  for  the  cover,  c,  which  may  be  used  to 
rewrite  the  first  predicate  to  give 

p  :  Pattern;  type  ;  Type;  c  :  Cover;  vals  :  P  Value 
Cons _patt;  p  =  consp  QCons j>att 
partype  ;  Type;  partype  =  type_of_con  con 
PCConstructions;  Coverage_property^„,p  j^^^p^i^p^^ 

V 

vals  =  {val :  Abs Jn(coverage pan,  type  ofjcon  con)  •  consv  QCons  val) 
vals  =  {val ;  type  of  con  con  |  pan  matches  val  •  consv  QCons _yal) 

This  theorem  follows  immediately  from  the  hypothesis.  The  tuple  case  requires  an 


induction  over  the  length  of  the  tuple,  starting  with  the  base  case  of  the  tuple  size 
being  2.  The  constructor  and  tuple  case  together  give  the  theorem  required. 

The  other  requirements  on  coverage  are  easily  established.  As  an  example,  we  can  try 
and  establish 

PatternjCompatible;  c  :  Cover;  vals  :  IP  Value 
c  =  coverage(p)  a  vals  =  valcover(p,  type) 

»• 

c  =  incomplete  v  vals  { } 

The  base  cases  are  a  consequence  of  the  following  theorems  whose  proof  is 
immediate. 

PCPrimitive;  CPrimitive;  vals  :  IP  Value 
p  =  unitp  A  vals  =  {v  .•  type  \  M Primitive} 

y 

c  ^  incomplete  a  vals  =  {unitv} 

PCPrimitive;  CPrimitive;  vals  :  P  Value 
p  €  ran  scons tp  a  vals  =  {v  .•  type  |  M Primitive} 

V 

c  =  incomplete 

PCPrimitive;  CPrimitive;  vals  :  P  Value 
p  e  ran  var  a  vals  =  {v  ,•  type  |  M Primitive} 

y 

c  *  incomplete  a  vals  =  type 
The  induction  property  is 

Incomplete  jjroperty  , 

p  :  Pattern;  type  :  Type 

PatternjCompatible  c  ^  incomplete  ^  vals  ^ 
where 

c  ==  cover age(p) 
vals  ==  valcover(p,  type) 


The  constructor  induction  step  is  as  follows 
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p  :  Pattern;  type  :  Type;  c  :  Cover;  vals  :  IP  Value 
Cons j)att;  p  =  consp  QCons _patt 

PCConstructions;  partype  :  Type;  partype  =  type_of_con  con 
Coveragej>roperty^p^^,p  p^rtypeitype] 

CConstructions  a  vals  =  {v  ;  type  |  MConstructions) 
c  ^  incomplete  =»  coverage  patt  ^  incomplete 

From  the  conclusion  and  the  hypothesis  it  is  possible  to  show  that 
valcover(patt,  type_ofjcon  con)  is  not  empty,  from  which  it  follows  that  vals  is  not 
empty.  As  before,  the  tuple  case  will  involve  an  induction  over  the  size  of  the  tuple. 

To  summarise,  the  proofs  of  4  theorems  are  required,  as  listed  in  section  5,  of  which 
the  first  two  have  been  outlined  here.  The  third  theorem  follows  immediately  from 
the  definition  of  Abs Jn  while  the  fourth  will  have  a  similar  structure  to  that  given 
above.  Thus  to  show  the  main  structure  of  the  proof  fully  we  are  required  to  display 
18  theorems:  5  each  for  the  non-trivial  theorems  corresponding  to  the  constructors  of 
Pattern,  2  for  the  type  lemmas  and  one  for  the  third,  trivial,  theorem. 

6.2  Proofs  for  the  union  function 

The  main  theorem  we  have  to  show  is 

Union 

y 

Abs  Jn(c,  type)  =  Abs  Jn(cj,  type)  u  Abs  Jn(C2,  type) 

Two  lemmas  are  needed  for  this,  which  prove  that  the  difference  and  intersection 
functions  have  their  intended  effect.  The  intersection  lemma  is  the  easiest  of  these, 
and  may  be  expressed  using  the  schema  below: 

_  Intersection  _ _ 

I 

Cj,  C2,  c  :  Cover 
type ;  Type 

Cj  cover  jcompatible  type  a  C2  cover jcompatible  type 
c  -  inter section(c  j,  C2) 


The  theorem  required  is 

Intersection 

¥ 

Abs  Jn(c,  type)  =  Abs  Jn(Cj,  type)  n  Abs  Jn(C2,  type) 


The  proof  involves  a  structural  induction  over  a  cross-product  of  Cover,  again  with  type 


r 


corresponding.  There  are  4  constructors  in  the  datatype  and  consequently  16  cases  to 
prove.  Of  these,  the  two  cases  with  mixed  pair  and  construct  constructors  may  be 
eliminated  because  they  cannot  be  simultaneously  compatible  with  the  type.  Of  the 
14  remaining  cases,  12,  which  deal  with  the  Cj  or  C2  being  complete  or  incomplete,  are 
trivial  and  follow  immediately  from  the  fact  that  the  abstraction  function  delivers 
the  full  set  of  values  in  the  type,  or  the  empty  set,  respectively.  As  a  sample  theorem 
required  for  the  full  proof,  take  the  case  of  a  pair  of  construct  coverages.  The  necessary 
theorem  will  be  written  in  terms  of  the  following  induction  property: 

_  Intersect  jjroperty  .. 

Cj,  C2,  c  :  Cover;  type  :  Type 

Intersection  vals  =  valsj  n  vals2 
where 

vals  I  ==  Abs  J'n(Cj,  type) 
vals 2  ==  Abs  Jn(c2,  type) 
vals  ==  Absjn(c,  type) 


With  this,  the  constructor  induction  step  is 

c j,  C2,  c  :  Cover;  valsj,  vals2,  vals  :  P  Value;  type  ;  Type 
F j,  F2  '■  CON  Cover;  Cj  =  construct  F j;  C2  =  construct  F 2 
AIj,-  AI2;  Al;  typcj  =  type 2  =  type 
CCConstructionSj;  CCConstructions 2 

V  con  :  dom  F j  u  dom  F2;  type  :  Type;  Cj,  C2,  c  :  Cover 
I  type  =  type_of_con  con  a  Cj  =  F j  con  a  C2  =  ^2 
•Intersect  property 

V 

IConstructions  vals  =  valsj  r»  vals 2 

This  impressive  list  of  hypotheses  is  generated  quite  mechanically.  The  theorem 
may  be  proved  by  showing  V  v  .•  vals  •  v  e  valsj  a  v  e  vals 2-  This  is  straightforward, 
but  rather  tedious,  and  complicated  by  the  extra  test  on  incompleteness. 

As  a  final  sample,  the  proof  that  an  increment  to  the  coverage  implies  an  increment 
to  the  values  covered  will  be  exhibited.  The  theorem  to  prove  is 

Union 

¥ 

c  ^  C2  Absjn(c,  type)  ^  Absjn(c2,  type) 

The  base  cases  are  relatively  easily  established,  so  we  shall  consider  the  induction 
steps.  As  before  define  a  schema  to  express  the  induction  hypothesis: 
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Increment _property _ 

Cj,  c 2,  c  :  Cover;  type  :  Type 


Union  c  ^  €2=^  vals  *  vals2 
where 

vals  ==  Abs Jn(c,  type) 
vals 2  ==  Abs J'n(c-^:  type) 


Taking  the  tuple  induction  step  this  time,  the  theorem  to  be  proved  is  as  follows: 

Cj,  C2,  c  :  Cover;  valsj,  vals2,  vals :  IP  Value;  type  :  Type 
Fj,  F2  :  F  (Cover  x  Cover);  Cj  =  pairFj;  C2  =  pair  F2 
AJ j,-  AI2;  AI;  typej  =  type 2  =  type 
CCTuplesj;  CCTuples2 

hdrype,  tltype  :  Type;  hdtype  =  FID  type  a  tltype  =  TL  type 
'V  C]  :  dom  Fj;  C2  :  dom  F2;  type  :  Type;  c  :  Cover 
I  type  =  hdtype 
♦  Increment jjroperty 

Cj :  ran  Fj;  C2  :  ran  F2:  type  :  Type;  c  :  Cover 
I  type  =  tltype 
•Increment jjroperty 

h 

UT uples  =»  c  C2  vals  #  vals2 

This  theorem  may  be  established  by  showing  that  3  v ;  valsj  •v  e.  vals2-  As  with  all  the 
tuple  cases,  this  is  not  particularly  easy  to  prove  although  the  existential  witness  is 
easy  to  find. 

To  summarise  for  the  union  case,  five  proofs  are  required,  as  listed  in  section  5.2,  of 
which  the  second  and  fourth  are  immediate.  The  first  proof  breaks  down  into  the 
intersection,  difference  and  union  cases,  each  based  on  a  cross  product  of  Cover  each 
requiring  the  proof  of  16  subsidiary  cases,  48  in  all.  There  will  also  be  2  subsidiary 
lemmas  for  manipulating  the  type  in  the  induction  property,  just  as  for  coverage.  The 
third  and  fifth  theorems  also  give  rise  to  16  cases  so  there  are  82  subsidiary  goals  to 
establish.  Of  these  the  tuple  cases  are  relatively  complicated  and  would  need  to  be 
broken  down  into  further  goals. 


7  The  implementation 


This  section  gives  the  text  of  the  Algol68  module  which  implements  the  design 
specified.  It  is  included  to  give  some  idea  of  the  formal  distance  between  the  design 
specification  derived  in  section  5  and  an  actual  implementation  language.  The 
implementation  is  pan  of  an  ML  interpreter  and  so  covers  the  aspects  which  were 
simplified  in  the  design  specification  (apan  from  the  exact  treatment  of  the  special 
constants).  Th^  main  structure  of  the  implementation  follows  fairly  closely  that  of 
the  specification  inasmuch  as  it  is  easy  to  relate  the  pans  of  the  implementation  to 
the  pans  of  the  specification.  The  Z  datatypes  Pattern  and  Cover  are  implemented  by 
the  Algol68  pattern  and  cover  respectively  and  the  coverage  and  union  functions  have 
the  same  name.  The  checking  operation,  Check_op_2,  is  implemented  by  the  Algol68 
addpatt  and  the  test  in  the  final  operation  is  given  by  the  procedure  comp.  However,  the 
formal  distance,  in  the  sense  of  the  theorems  necessary  to  demonstrate  the 
refinement,  is  still  large. 

In  panicular,  funher  steps  of  data  refinement  have  been  undertaken.  The  constructor 
coverage  measure,  which  in  Z  has  been  represented  by  CON  Cover,  has  been  data 
refined  into  1^  Cover,  which  is  implemented  as  the  Algol68  array  of  cover,  []cover. 
In  addition,  the  implementation  uses  a  total  function  from  the  set  of  constructors  in  the 
datatype,  rather  than  a  partial  function  whose  domain  is  the  set  of  constructors 
already  encountered.  In  the  implementation,  the  constructors  which  have  not  been 
encountered  are  given  an  incomplete  coverage. 

Similarly,  for  tuples,  the  Z  representation  of  pair  «  F  (Cover  x  Cover)  »  is  implemented 
by  the  Algol68  mode  PAIR  =  struct  (cover  a,  b,  ref  pair  next) .  In  this  case  the 
abstract  sets  are  being  implemented  by  linked  lists,  with  a  natural  implementation 
of  universal  quantifiers  in  terms  of  loops. 

In  the  union  function,  there  has  been  some  operation  refinement  of  the  test  for 
completion,  the  calculation  of  the  remainder  coverage  and  the  test  for  whether  the 
coverage  has  changed.  These  are  all  done  during  the  execution  of  the  main  loop  in 
the  function,  rather  than  sequentially  as  would  result  from  a  simple-minded 
implementation  of  the  specification.  As  a  result  the  union  function  delivers  both  the 
new  coverage  and  an  indication  of  whether  it  has  changed. 

ml  testmatch: 


ml_mocies  :  Module  | 
ml_runtime  .-Module 
ml_comp_mode  : Module 


MODE  COVER, 

CONSTRUCT  -  STRUCT (REF[]C0VER  cvs) , 

PAIR  «  STRUCT (COVER  a,  b,  REF  PAIR  next), 

COVER  «  UNION (BOOL,  {TRUE  »  complete,  FALSE  '  incomplete) 

REF  CONSTRUCT, 

REF  PAIR 
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REF  PAIR  no_pair  =  NIL; 

PROC  construct  =  (REF[]cOVER  cvs) COVER; 

(HEAP  CONSTRUCT  c;  CVS  OF  c  CVS;  c>  ; 

PROC  incomp  =  (COVER  c)BOOL:  CASE  c  IN  (BOOL  b) :  NOT  b  OUT  FALSE  ESAC; 

{test  for  incomplete  cover} 

PROC  comp  =  (COVER  c)BOOL:  CASE  c  IN  (BOOL  b) ;  b  OUT  FALSE  ESAC; 

{test  for  complete  cover} 

PROC  coverage  =  (PATTERN  patt) COVER: 

CASE  patt 

IN  (VOID) :  TRUE  {unit  constant  is  complete} 

,  (BOOL  b) :  {Boolean  special  constants} 

(HEAP[l  :2]C0VER  valcover  :=  (FALSE,  FALSE); 
valcover[ABS  b  +  l]  :=  TRUE; 
construct (valcover) 

) 

,  (INT) :  FALSE  {integer  indicates  special  constant  which  is  incomplete] 
,  (SHORT  REAL):  FALSE  {...} 

,  (LINE) :  FALSE  {. . .} 

,  (VAR) :  TRUE  {Variables  are  complete} 

,  (REF  CONSTANT  c) ; 

(REF  TYCON  tycon  =  tycons[tyconno  OF  c]; 

INT  nocons  *  nocons  OF  tycon; 

IF  nocons  /=  1 

THEN  HEAP[l  :nocons]COVER  valcover; 

FOR  i  TO  nocons  DO  valcover[i]  :=  FALSE  OD; 
valcover[consno  OF  c]  ;=  TRUE; 
construct (valcover) 

ELSE  {datatype  contains  only  one  value,  and  this  must  be  it} 

TRUE 

FI 

) 

,  (REF  CONSPATT  cp) : 

(REF  TYCON  tycon  *  tycons[tyconno  OF  cp]; 

INT  nocons  *  nocons  OF  tycon; 

COVER  parcover  “  coverage (par  OF  cp) ; 

IF  incomp (parcover)  {test  incomplete} 

THEN  FALSE 
ELIF  nocons  -  1 

ANDTH  comp  (parcover)  {test  complete} 

THEN  TRUE 

ELSE  HEAP[l  :nocons]C0VER  valcover; 

FOR  i  TO  nocons  DO  valcover[i]  :»  FALSE  OD; 
valcover[consno  OF  cp]  :■  parcover; 

construct (valcover) 

FI 

) 

,  (REF  REFPATT  rp) : 

coverage (patt  OF  rp)  {as  there  is  only  one  ref  constructor} 
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,  (REF  LAYERED  1) : 

coverage  (patt  OF  1)  {the  equivalent  pattern} 

,  (REF  LISTPATT  Ip) : 

(HEAP[l : 2]COVER  valcover  :=  (FALSE,  FALSE); 

IF  Ip  IS  nolistpatt  {test  nil} 

THEN  valcover[l]  :=  TRUE 

ELSE  {:;  is  a  constructor  of  a  2-tuple;  {'a  *  'a  list)  ->  a  list} 

HEAP  PAIR  pair  :=  (coverage (hd  OF  Ip),  coverage  (tl  OF  Ip),  NIL) ; 
valcover[2]  ;=  pair 

FI; 

construct (valcover) 

) 

,  (REF  TUPLEPATT  tp) : 

(REF  VECTOR[]patTERN  patts  =  patts  OF  tp;  INT  size  =  UPB  patts; 

COVER  a  =  coverage (patts[l])  , 
b  =  IF  size  =  2 

THEN  coverage (patts[2]) 

ELSE  TUPLEPATT  tltp;  patts  OF  tltp  : -  p3tts[2;]; 
coverage (tltp) 

FI; 

HEAP  PAIR  :=  (a,  b,  NIL) 

) 

ESAC; 

PROC  intersection  =  (COVER  cl,  c2) COVER: 

CASE  c2 

IN  (BOOL  b) :  IF  b  THEN  cl  ELSE  FALSE  FI 
OUSE  cl 

IN  (BOOL  b) :  IF  b  THEN  c2  ELSE  FALSE  FI 

{the  case  elements  above  implement  IPriraitive, 
the  next  one  implements  IConstructions} 

,  (REF  CONSTRUCT  csl): 

CASE  c2 

IN  (REF  CONSTRUCT  C52) : 

(BOOL  incomplete  :=  TRUE; 

[JCOVER  fl  =  CVS  OF  csl,  f2  =  CVS  OF  cs2; 

INT  size  =  UPB  fl; 

HEAP[l :size]COVER  f; 

FOR  i  TO  size 

DO  IF  NOT  incomp(f[i]  :=  intersection (fl[i],  f2[i]) ) 

THEN  incomplete  FALSE 
FI 

OD; 

IF  incomplete  THEN  FALSE  ELSE  construct (f)  FI 

) 

ESAC 

,  (REF  PAIR  pi) :  {This  element  implements  ITuples} 

CASE  c2 

IN  (REF  PAIR  p2) : 

(REF  PAIR  ppl,  pp2  p2,  {pointers  int j  the  lists} 
fprime  :•  NIL;  {the  result} 

WHILE  pp2  ISNT  no_pair  {test  not  end  of  list} 

DO  ppl  pi; 

WHILE  ppl  ISNT  no_pair 

DO  COVER  a  -  intersection (a  OF  ppl,  a  OF  pp2) , 


b  =  intersection (b  OF  ppl,  b  OF  pp2) 
IF  NOT  incomp (a> 

ANDTH  NOT  incomp (b) 

THEN  f prime  :=  HEAP  PAIR  :=  (a,  b,  f prime) 
FI; 

ppl  :=  next  OF  ppl 

OD; 

pp2  :=  next  OF  pp2 

OD; 

IF  fprime  IS  no_pair 
THEN  FALSE 
ujjSE  fprime 
FI 

) 

ESAC 

ESAC; 

PROC  difference  =  (COVER  cl,  c2)COVER: 

CASE  c2 

IN  (BOOL  b) :  IF  b  THEN  FALSE  ELSE  cl  FI 
OUSE  Cl 
IN  (BOOL  b) : 

IF  b 

THEN  CASE  c2 

IN  (REF  CONSTRUCT  cs2) ; 

(cl  complete  c2  constructor} 

(REF[]cOVER  f2  =  CVS  OF  cs2; 

[l:UPB  f2]C0VER  fl; 

FOR  i  TO  UPB  fl  DO  fl[i]  TRUE  OD; 
difference (construct (fl) ,  c2) 

) 

,  (REF  PAIR  p2) ; 

{cl  complete  c2  tuple} 

(PAIR  pi  :=  (TRUE,  TRUE,  NIL) ; 
difference (pi ,  c2) 

) 

ESAC 

ELSE  FALSE 
FI 

,  (REF  CONSTRUCT  csl) : 

CASE  c2 

IN  (REF  CONSTRUCT  cs2) : 

{DConst ructions} 

(BOOL  incon^lete  :=  TRUE; 

REF[]C0VER  fl  «  CVS  OF  csl,  f2  •=  CVS  OF  cs2; 

INT  size  =  UPB  fl; 

HEAP[l:size]COVER  f; 

FOR  i  TO  size 

DO  IF  NOT  incomp  (f[i]  ;«  dif  ference  (f  l[i] ,  f2[i])  ) 
THEN  incomplete  FALSE 
Fl 

OD; 

IF  incomplete  THEN  FALSE  ELSE  construct (f)  FI 

) 

ESAC 

,  (REF  PAIR  pi) ; 

CASE  c2 

IN  (REF  PAIR  p2) : 


(REF  PAIR  ppl*  pp2  :=  p2,  {pointers  into  the  lists} 
fprime  :=  NIL;  (the  result} 

WHILE  pp2  ISNT  no_pair  {test  not  end  of  list} 

DO  ppl  : =  pi ; 

WHILE  ppl  ISNT  no_pair 

DO  COVER  X  =  difference (a  OF  ppl,  a  OF  pp2), 
y  =  difference (b  OF  ppl,  b  OF  pp2), 
z  =  intersection (a  OF  ppl,  a  OF  pp2) ; 

IF  NOT  incoinp(x) 

THEN  fprime  ;=  HEAP  PAIR  :=  (x,  b  OF  ppl,  fprime) 

FI ; 

IF  NOT  incomp (y)  ANDTH  NOT  incomp (z) 

THEN  fprime  :=  HEAP  PAIR  :=  (z,  y,  fprime) 

FI; 

ppl  :=  next  OF  ppl 

OD; 

pp2  :=  next  OF  pp2 

OD; 

IF  fprime  IS  no_pair 
THEN  FALSE 
ELSE  fprime 
FI 

) 

ESAC 

ESAC; 

MODE  UNIONRES  =  STRUCT (COVER  c,  BOOL  differs); 

{This  is  what  union  will  actual  deliver.  The  boolean  says  whether  c  differ 
from  c2} 

PROC  union  =  (COVER  cl,  c2) UNIONRES: 

CASE  c2 
IN  (BOOL  b) : 

IF  b 

THEN  {c2  complete,  result  unchanged}  (TRUE,  FALSE) 

ELSE  {^2  incomplete,  result  changed  unless  cl  incomplete} 

(c*l,  NOT  incomp  (cl)) 

FI 

OUSE  cl 
IN  (BOOL  b) : 

IF  b 

THEN  {cl  complete,  result  unchanged  unless  c2  complete} 

(TRUE,  NOT  comp(c2)) 

ELSE  {cl  incomplete,  result  unchanged}  (c2,  TRUE) 

FI 

,  (REF  CONSTRUCT  csl) : 

CASE  c2 

IN  (REF  CONSTRUCT  cs2) ; 

{uConst ructions} 

(BOOL  con^lete  TRUE,  changed  :=  FALSE; 

[]COVER  fl  -  CVS  OF  csl,  f2  -  cvs  OF  cs2; 

INT  Size  -  UPB  fl; 

HEAP[l:sire]COVER  f; 

FOR  i  TO  size 

DO  UNIONRES  res  :■  union(fl(i],  f2[i])  ; 
changed  :•  changed  OR  differs  OF  res; 

IF  NOT  comp(f[i]  c  OF  res) 


THEN  complete  :=  FALSE 
FI 

OD; 

IF  complete 

THEN  {it  must  have  changed}  (TRUE,  TRUE) 

ELIF  changed 

THEN  (construct  (f) ,  TRUE) 

ELSE  (c2,  FALSE) 

FI 

) 

ESAC 

,  (REF  PAIR  pi)  : 

CASE  c2 

IN  (REF  PAIR  p2)  : 

{UTuples} 

(REF  PAIR  ppl,  pp2  :=  p2,  {pointers  into  the  lists} 
fprime  :=  NIL;  {the  result} 

BOOL  changed  :=  FALSE; 

COVER  completed  :=  FALSE; 

PROC  add  =  (COVER  a,  b)VOID: 

(changed  :=  TRUE; 

IF  comp (b) 

THEN  completed  :=  c  OF  union  (a,  completed) 

ELSE  fprime  :=  HEAP  PAIR  :=  (a,  b,  fprime) 

FI 

)  ; 

WHILE  pp2  ISNT  no_j>air  {test  not  end  of  list} 

DO  ppl  :=  pi; 

COVER  remainder  :=  a  OF  pp2; 

WHILE  ppl  ISNT  no_pair 

DO  COVER  X  =  difference (a  OF  ppl,  a  OF  pp2), 

y  =  intersection (a  OF  ppl,  a  OF  pp2); 

UNIONRES  res  =  union (b  OF  ppl,  b  OF  pp2); 

COVER  z  =  c  OF  res; 

IF  differs  OF  res 
THEN  IF  NOT  incomp(x) 

THEN  add(x,  b  OF  ppl) 

FI; 

IF  NOT  incomp (y)  ANDTH  NOT  incomp (z) 

THEN  add(y,  z) 

FI 

FI; 

remainder  ;=  difference (a  OF  pp2,  a  OF  ppl); 
ppl  next  OF  ppl 

OD; 

IF  NOT  incomp (remainder)  THEN  add (remainder,  b  OF  pp2)  FI; 
pp2  next  OF  pp2 

OD; 

IF  NOT  changed 
THEN  (c2,  FALSE) 

ELIF  comp (completed) 

THEN  (TRUE,  TRUE) 

ELSE  IF  NOT  incomp (completed) 

THEN  fprime  HEAP  PAIR  (completed,  TRUE,  fprime) 

FI; 

(fprime,  TRUE) 


ESAC 


) 


FI 


ESAC; 

PROC  addpatt  =  (PATTERN  p,  COVER  c2) COVER; 
{Equivalent  to  Chec)c_op_2} 

(COVER  cl  =  coverage (p) ; 

UNIONRES  res  =  union (cl,  c2); 

IF  NOT  incomp (cl)  ANDTH  NOT  differs  OF  res 
THEN  warn  ("This  pattern  is  redundant") 

FI; 

c  OF  res 

) 

KEEP  COVER,  coverage,  union,  addpatt,  comp 
FINISH 


8  Conclusions 


Before  discussing  the  implications  of  this  case  study,  it  is  worth  emphasising  the 
fact  that  this  report  is  about  refinement,  not  about  the  precise  form  of  the  pattern 

matching  algorithm  to  use  in  ML.  Other  studies,  for  example  Baudinei  and 

MacQueen  [1987],  Peyton  Jones  [1987],  give  algorithms  for  compiling  pattern 
matching  expressions,  but  the  problem  of  demonstrating  the  correctness  of  the 
algorithm  still  remains.  Given  this  approach  to  the  formal  development  of  software 

therefore,  the  questions  arise  as  to  whether  it  is  actually  helpful  to  the  developers 

and  whether  it  actually  convinces  the  evaluators. 

Taking  the  developers  point  of  view  first  of  all,  when  reading  this  case  study,  one 
motors  along  quite  happily  through  sections  1  to  4  and  then  the  going  gets  rough  in 
section  5  and  finally  one  gets  bogged  down  in  section  6.  Section  5  is  difficult 
because  the  problem  is  difficult  Although  the  intuitive  idea  of  an  exhaustive  check 
is  clear  it  is  extremely  hard  to  think  of  a  general  algorithm  which  convincingly 
copes  with  all  cases.  In  addition,  intuitively  one  thinks  that  the  type  structure  must 
play  a  relatively  minor  role  in  the  algorithm,  but  equally  it  must  be  present  in  the 
specification  and  the  problem  is  to  see  what  assumptions  can  be  made  about  it. 
Given  that,  the  difficulty  in  section  5  is  quite  understandable  and  this  approach, 
namely  the  development  of  the  abstraction  invariant  and  its  use  in  calculating  the 
implementation  required,  is  a  sensible  way  of  designing  software.  Note  incidentally, 
that  the  "bottom- up”  approach  of  designing  the  software  and  then  attempting  to  show 
that  it  satisfies  the  specification  is  dangerous.  Our  work  on  this  study  was  impeded 
by  several  months  fruitlessly  trying  to  prove  that  our  intuitive  algorithm  was  correct 
when  it  was  not. 

Given  the  investment  in  abstract  design,  represented  by  sections  1  to  5,  the 
implementation  becomes  very  easy.  The  implementation  in  section  7  was  written, 
tested  and  debugged  in  1  day.  (Debugged?  Yes,  CTS  wrote  TRUE  instead  of  FALSE 
and  forgot  to  move  on  the  list  pointers  in  some  of  the  loops.  These  faults  will  not  be 
found  by  the  formal  method  without  much  more  formalism,  but  are  relatively  easily 
found  by  testing.)  One  can  argue  therefore  that  abstract  algorithm  design  is  cost 
effective  as  a  production  method,  but  this  does  not  apply  to  the  formal  proof  aspects 
detailed  in  section  6.  This  is  because  the  actual  development  of  the  proof  is 
demanding.  And  it  is  demanding  because  it  is  boring  and  it  is  boring  because  it  does 
not  lead  to  a  much  greater  insight  into  the  problem. 

From  the  evaluator's  point  of  view  one  can  take  a  similar  warm  view  to  the  process  of 
abstract  algorithm  design  exemplified  by  the  first  5  sections,  because  these  help  the 
evaluator  to  understand  the  algorithm.  Indeed,  it  is  probably  the  only  approach  from 
which  one  can  derive  a  convincing  argument  for  correctness  and  consequently  the 
only  approach  likely  to  satisfy  an  independent  evaluation.  It  is  however  not  clear  the 
extent  to  which  the  proof  process  adds  further  assurance:  at  some  points  it  is  hard  to 
see  what  is  going  on  through  all  the  detail  of  the  formalism.  Furthermore,  it  seems 
inevitable  that  the  development  will  be  informal  at  some  stage,  simply  by  reason  of 
the  number  of  proofs  required.  Section  6  covers  only  an  outline  of  the  major  proof 
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steps  so  one  could  anticipate  a  proof  document  for  what  is  a  small  part  of  an  ML 
compiler  being  perhaps  10  times  larger  than  this  section.  Moreover,  this  section  is 
notable  for  what  is  omitted  rather  than  what  is  included.  At  the  higher  level  the 
formalisation  of  how  the  compiling  functions  are  called  has  been  omitted  and  at  the 
lower  level,  there  is  at  least  as  much  operation  refinement  required  to  end  up  with 
the  implementation  language  as  the  data  refinement  needed  to  end  up  with  the 
design.  The  sheer  number  of  proof  steps  generated  ntakes  it  seem  inconceivable  that 
the  pattern  matching  algorithm  in  an  ML  compiler  will  ever  be  carried  out  fully 
formally  and  mechanically  checked. 

It  is  not  however  clear  that  mechanical  proof  is  always  called  for  in  the  very  highest 
levels  of  assurance.  Using  the  DoD's  evaluation  criteria  as  an  example  [DoD  1985],  one 
could  equate  the  development  recorded  in  sections  1  to  5  with  the  assurance  level 
contained  in  the  A1  criteria.  The  specification  corresponds  to  the  policy  model; 
section  5  corresponds  to  the  formal  top  level  specification;  the  combination  of 
formal  and  informal  techniques  showing  the  correspondence  with  the  specification 
is  contained  in  sections  3  to  5.  The  formalism  has  been  mechanically  checked  for 
correct  syntax  and  well-typing;  the  question  is,  is  that  enough  to  satisfy  A1 
assurance,  or  the  equivalent  UK  confidence  level  6,  assured  design. 

Rather  than  answer  this  contentious  question  directly,  we  shall  discuss  two  related 
ones,  namely,  whether  the  proof  process  of  section  6  materially  adds  to  the 
confidence  in  the  software,  and  whether  it  is  necessary  to  supply  the  missing 
formalism.  Taking  the  latter  question  first,  we  do  not  believe  that  much  would  be 
gained  by  formalising  the  syntax  analyser  part  of  an  ML  compiler  and  relating  it  to 
the  compiling  operation  specifications  so  that  the  pre-conditions  for  these  operations 
were  derived  formally,  rather  than  informally  as  has  been  done  here.  Syntax  analysis 
is  normally  treated  with  a  special  purpose  tool  anyway  and  the  pre-conditions  are 
relatively  easily  checked  by  eye.  The  further  refinement  involved  in  arriving  at  the 
Algol68  implementation  is  more  debateable.  For  the  most  part  this  is 
straightforward  although  one  would  like  to  see  some  code  level  verification  of  the 
loops  in  the  union  function.  Note  that  the  design  specification  is  suggestive  of  the 
annotations  which  would  be  necessary  for  verification  condition  generation  by  tools 
such  as  MALPAS  and  Gypsy. 

Whether  the  proof  in  general  is  adding  assurance  is  a  more  subjective  judgement.  To 
give  it  an  appropriate  subjective  context  one  could  rephrase  the  question  thus.  Given 
that  one  is  going  to  entrust  one's  life  to  a  piece  of  software,  which  will  be  written  to  a 
fixed  price  by  contractor  A  and  evaluated  to  a  fixed  price  by  contractor  B,  what 
should  A  and  B  be  required  to  do?  Within  this  context  the  proof  process  is  valuable 
because  it  gives  an  objective  test  for  satisfaction.  Mechanical  aids  are  important  in 
this,  but  their  use  should  not  cloud  the  understanding  of  the  proof  in  the  evaluator's  mind. 
Mechanical  proof  should  support  the  kind  of  informal  proof  common  in  mathematics, 
rather  than  supplanting  it.  In  practice,  the  theorems  contained  in  section  6  arc  at  a 
level  of  detail  which  s.iould  be  within  the  range  of  a  mechanical  theorem  prover  and 
at  which  an  evaluator  could  well  take  the  mechanical  proof  on  trust. 
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Even  with  mechanical  aids,  it  is  unlikely  that  realistic  problems  will  be  capable  of 
having  the  proof  of  all  the  theorems  exhibited  being  carried  out  mechanically.  For 
this  particular  case  study,  going  down  to  this  level  of  detail  gives  rise  to  at  least  a 
100  theorems.  Requiring  the  proof  of  all  these  theorems  to  be  carried  out 
mechanically  for  certification  would  be  unreasonably  costly.  However  a  selective 
approach  is  sensible  and  enables  the  level  of  assurance  to  be  related  to  the  amount 
of  effort  spent  in  verification.  For  example,  in  this  case  study,  the  theorems  are 
arranged  in  a  tree;  the  refinement  obligations  are  expressed  by  two  theorems,  which 
are  broken  dow-n  into  9  which  expand  into  the  100  or  so.  A  satisfactory  technique 
would  be  to  prove  that  the  9  theorems  entailed  the  refinement  obligations,  thus 
demonstrating  that  all  cases  had  been  considered  and  then  (o  take  one  of  the 
branches  of  the  proof  tree,  perhaps  for  the  harder  tuple  case,  down  to  the  leaves  of 
the  tree.  Even  this  might  be  too  onerous  and  an  evaluator  could  be  satisfied  with 
proving  some,  but  not  all,  of  the  theorems  on  the  way,  simply  in  order  to  make  the 
most  cost-effective  use  of  his  time.  A  mechanical  prover  could  help  here  by  proving 
certain  of  the  easier  cases,  so  that  these  could  simply  be  accepted  by  the  evaluator. 

To  summarise: 

•  The  formal  development  process,  in  particular  the  key  step  of  exhibiting  the 
abstraction  invariant  and  using  it  to  develop  the  abstract  algorithm  design,  is  a 
useful  and  cost-effective  technique  for  developing  software,  quite  apan  from  its 
use  in  demonstrating  correctness. 

•  Formal  proof  adds  to  the  assurance,  but  complete  formality  obscures.  Proof 
needs  to  be  undertaken  within  a  context  using  both  formal  and  informal 
elements.  The  structure  of  the  proof  and  the  way  it  is  delivered  to  an  evaluator 
(the  proof  document)  ought  to  follow  intuitive,  informal,  proof  methods. 
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This  report  is  a  case  study  in  data  refinement,  that  is  the  process  of  taking  a 
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process  and  proof  obligations  incurred  concluding  with  an  implementation  in  Algol 
68.  The  report  concludes  with  a  discussion  of  the  strengths  and  weaknesses  of 
the  formal  development  process. 
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